ft]} 154 018 

ADTBOfi 

TITLB 

XMSniOTIOH 
FOB DATE 

iois 

BDBS PBICE 
OESCEIPTOBS 



lOBHXIFIEBS 



OCCOaENt E£SQB£ 



IB 001 028 



Gustafsson, Jan-£ric 

Ihe Bascb Model for Dlcictcicus Iteas: Iheoryf 
Applications and a Cctputez frcgiaa. Ko« 63. 
Gothenburg Oniv. '(Sif€C€n) • Inst, ct Education. « 
Dec 77 

158p«; Best copy available 
flF-$0.83 HC-J8.69, Eluc iostage* 

♦Coiputer Prcgrais; Equated Sccies; ^Goodness of Fit; 

"^Iten Analysis; ^Matbeiatical Hcdels; ♦feliability ; 

♦Scores; Standard Errcr of Heasureiert; Statistical 

Analysis; Test Ccnstrtctici: ; Test Interpretation; 

Test Iteis; True Scores / 

Latent Trait Theory; ♦Eascfc Model; lailcicd 

Testing 



ABSTBACT 

The Basch Model for test analysis 
coapared vith tvo-paraseter and three-paraietex la 
Conditional Aaxinua likelihood equations for estis 
paraseters axe derived, and estimates ci person (a 
described together vith their ccnfiderce intervals 
tests are discussed, including a graphic test cf i 
over-*all tests^ Characteristics of tests which lay 
fit the sodel aire listed, together vith strategies 
tests. Applications of the fasch model to optiaizi 
test equating and linking, and tc tailored testing 
Some generalizations of the basic Ecdel aid specia 
■enticned. EMI, a FCBTRAN IV coaputer pxcgxai used 
Basch model, is described in detail, aid a rumbex 
references are appended. (CTH) 



is desciited and 
tent'-trait models* 
ating ites 
ratetexs are 
« Goodness of fit 
tern fit and tvo 

cause them net tc 

fox developing 
ng test efficiency, 

are described. 
1 cases are 

in applying the 
cf bibliographical 



♦ Beprodoctions supplied by IDES ar€ the best that car be made ♦ 

♦ from the original dccumert. * 

#«#]^#4>««4i«««««4«4t4'«««4>4>4>4>4>4>4>4>4>44444444 4 44444 44 44 44 444 44 44 4 444444444«««« 



ERIC 



CONTENTS 



page 

ACKNOWLEDGEMENTS 

ABSTRACT J 

INTRODUCTION 1 

Chapter 1 BASIC CONCEPTS AND MODELS IN LATENT-TRAIT 

THEORY 2 

1.1 Three logistic models 2 

The one-*parameter model • • 3 

The two- and three parameter models.. 6 

1.2 Assumptions underlying the LT models 9 

Unidimensionality 9 

^ Local statistical independence 11 

The form of the item characteristic - 

curve 12 

1.3 The Rasch model versus the other, models .. . 13 

J.nterpretability 13 

Estimation of parameters 15 

Testing assumptions. 17 

Conclusion 1^ 

Chapter 2 THE MATHEMATICS OP THE RASCH MODEL 20 

2.1 Estimating item parameters 20 

The computation of the symmetric func- 
tions 28 

The convergence of the iterations 3** 

2.2 Estimating person parameters 36 

2.3 The information function and confidence 
intervals for the estimated parameters ... . 38 

Confidence intervals for the item 

parameters ♦ 39 

Confidence intervals for the person 

parameters • 39 

The index of subject separation ^1 

Chapter 3 TESTING GOODNESS OF PIT TO THE RASCH MODEL '*3 

3.1 Testing item fit 

3.2 Overall tests of goodness of fit *»7 



ERLC 



3 



page 



The Andersen conditional likelihood 

ratio test » ^8 

The Martin-Lfif chi-square test 51 

The Martin-Lttf test versus the Andersen 

test 53 

3.3 Redundancy 57 

Chapter M CONSTRUCTING RASCH SCALES ^ 62 

14. 1 Analyses of two tests of PMA type 63 

Nudn^er series 63 

Opposites 69 

M.2 Item bias in Opposites.,' 7^ 

i4.3 Discussion , 80 

Sources of threat against the model 80 

Strategies and problems in the development 

of Rasch scales 

Degree of fit and inferential tests 85 

The concept of unidimensionality 86 

Chapter 5 SOME AREAS OF APPLICATION 89 

5.1 Test optimation. . • . ^ 89 

5.2 Tailored testing 9^ 

5.3 Test equating and linking 96 

Teat equating 96 

Test linking 101 

X 5.4 Item banks * 102 

Chapter 6 GENERALIZATIONS OF THE RASCH MODEL.., 103 

6.1 The polychotomous case 103 

6.2 The linear logistic model 105 

6.3 Analyses of experimental data....... 106 

Chapter 7 THE PML PROGRAM 107 

7.1 The two versions of PML 107 

7.2 Obtaining- a copy of the program 108 

7.3 Using PML 108 

How to use the OSIRIS version 108 

How to use the non-0SIRI3 version 113 



ERIC 



4 



page 



The most important subroutines 118 o 

7.5 Dimensioning of the program . 119 

7.6 A sample printout 120 

7.7 The source code of the non-OISIRIS ver- 
sion of PML... 129 

REFERENCES 1^*5 



ERIC 



ACKNOWLEDGEMENTS 



The rasearch presented in this report has been financially 
supported by the Swedish Council for Research in the Humanis- 
tic and Social Sciences and by the National Board of Education 
and has been carried out at the Institute of Education » 
University of GOteborg. 

I should like to express my gratitude to some friends and 
colleagues, most of them at the Institute of Education and 
some of them at the Department of Educational Research, 
University of G5tebbrg, for their showed interest and great 
help. First of all I wish to thank Leif Lybeck who introduced 
me to the Rasch model, and throughout the work I have greatly 
profited from his expertise in the field. Without Jan-Gunnar 
Tingsell^s great skill in the art of computer programming 
there would probably not have been any computer program to- 
present; not only has he helped me track down errors but has 
also contributed! parts of the program. I also owe debts of 
latitude to Kjell H^rnqvist, Torsten Lindblad, Berner Lind- 
strSm and Inga Wernersson who read a first draft of the manu- 
script and all gave valuable comments. Christina Skttnvall had 
the ardous task of typing the final manuscript; I wish to 
thank her for a great job. 



MOlndal, December 1977 



Jan-Eric Gustafsson 



. ABSTRACT 



The report describes the Rasch model for dichotomous items, 
or the one-parameter logistic model, which is the simplest 
of the psychometric latent trait models. In the Rasch model 
each item is described with only one parameter, the difficul- 
ty, and each peraon is described with only one parameter, the 
ability. In Chapter 1 the basic features of the model are 
spelled out and a comparison is made with other, more complex, 
latent fcraits models. It is concluded that the Rasch model 
has decisive advantages ovfer the other models with respect 
to interpretability, estimation of parameters and possibilities 
of teeing assumptions. In Chapter 2 is shown how conditional 
maximum likelihood equations for estimating the item para- 
meters can be derived and it is explained how the numerical 
problems in solving these equations have been solved in a 
computer program so thdt estimates can be obtained even for 
large sets of items. The same chapter also deals with the 
estimation of person parameters and how to establish confi- 
dence intervals for the estimated parameters. 

In Chapter 3 goodness of fit tests based on the conditional 
estimates of the item parameters are presented. A graphic 
test of item fit is described and two overall numerical tests 
are taken up: one likelihood ratio test and one chi-square 
test. In Chapter ^ strategies and problems in developing 
scales fitting the model are discussed in relation to analyses 
of some tests developed within the framework of the classical 
psychometric theory. 

Chapter 5 presents some areas of applications of the Rasch 
model sufch as. test optimation, test equating and linking, and 
tailored testing. In Chapter 6 some generalizations of the 
basic model are briefly taken up; it is mentioned that models 
can be formulated also for the case when there are n»ore than 
two categories of answer and that a general linear logistic 
model can be used to study the sources of item difficulty. In 
Chapter 7, finally, the computer program is presented. 



INTRODUCTION 

In a discussion about prospective developments in item seleo- 
.tion theory for the construction of mental tests Gulliksen 
(1950) stated that: "A significant contribution to item ana- 
lysis theory would be the discovery of item parameters that 
remained relatively stable as the item analysis group 
changed. . (p. 392). 

This problem has been solved, along vith several others, 
within a class of models generally referred to as latent trait 
models (LT models, or modern test theory; other names some- 
times applied are item response theory and item characteristic 
curve theory ) . 

For different reasons, among which the nathemat leal and nume- 
rical complexities involved probably are the no3t important, 
LT models have not yet been widely applied in th^ development 
and use of tests, even though th^ last few years have shown 
some evidence of a change. 

There is in particular one LT model, variously referred to as 
the Rasch model or the one-parafmet er logistic model, whirh 
has been applied in solving practical problems and which 
holds special promise for /further us**. This report presents 
the Rasch model and indicates at least a selection of all its 
possible uses. Also presented is a computer program for con- 
ditional maximum likelihood estimation of parameters in tbe 
model and for computing goodness of fit tests. 



Chapter 1 



/ 



BASIC CONCEPTS AND MODELS I^ LATENT TRAir THEORY 



Although the basic tenets of LT theory can be found in early 
work by Lawley (19^3) and Lord (1952, 1953), the breakthrough 
came in the sixties, (For measurement of attitudes, however, 
Lazarsfeld very early formulated and used the closely related 
latent class model, see e.g. Lazarsfeld, 1950). During this 
decade Rasch (I960, 19o6) formulated his model and the com- 
putional problems in relation to the model began to be maste- 
red as well (Fischer & Allerup, 1968j Wright & Panchapakesan , 
1969), The sixties also saw the advent of the Lord and Novick 
(1968) treatise in which five chapters (four of which were 
contributed by A. Birnbaum) dealt with LT theory. 

In the last ten years a host of papers has also appeared 
dealing with specific questions, and rather simple, relative- 
ly non-mathematical introductions to LT theory have. appeared 
(e.g. Lybeck, 197^; Willmott & Fowles, 197^i Kifer, Mattson 
& Carlid, 1975i Baker, 1977j Hambleton et al., 1977) as well 
as at least one proper text book presentation (Fischer, 197^). 

1,1 Three logistic models 

Common to all LT models is that one set of parameters is used 
to describe the items in a test and that another single para- 
meter represents ability. An underlying psychological trait 
or latent continuum is thus assumed on which the standing of 
the examinees differs. Another thing common to all LT models 
is that a function relating the probability of a correct 
answer to an item is explicitly stated (the item characte- 
ristic curve, ICC). 

The differences between the models reside in the particular 
choice made of parameters describing the items and the kind 
of function used for the ICC. Two kinds of ICCs have been 



.tried, the normal ogive and the logistic function. However, 
since the logistic function is mathematically anrt computa- 
tionally much more tractable than the normal, opive the three 
most co^only used models are all based on the logistic 
function, with' the difference between the models residing., 
in the number of parameters used to describe the items. 

The one -parameter model 

In the simplest ca^sm^y one parameter is used for each iter., 
its difficulty. In order to describe this model we will need 
the following notation : 

= The difficulty parameter of item i. 

= The ability parameter of person v. 

= The ICC for item i. 

= A binary response variable with the 
value 1 if the answer of person v to 
item i is correct and the value 0 if 
incorrect or omitted. A particular 
realization of this stochastic 
variable is given the algebraic 
notation a^j^. 

= The number of items in the test. / 



fi(0 



^i 



The one-parameter model, or the Rasch model, asserts that the 
probability of a correct answer by person v to item i is: 



(1.1.1) P(A^i = lU^,Oi)= ^ ' 



l+exp(Ji^-o. ) 



The higher the value of the higher the probability of a 
correct answer and the higher the value of o. thF> lower the 
probabiWty of a correct answer. 



10 



o 



"Prom (1.1.1) follows that the ICC for an item i in the Hasch 
model can be written: 



(1.1.2) 



(5) = 



f 



\ 

l4^exp(e-o^) 



Two ice's for this model are shown in Figure Throughout, 
the curve for the more difficult item is located under the 
curve for the easier it^m. 




Figure 1J. Item characteristic curves for two items (0^=-1,02=l) in the 
one-parameter logistic model. 



11 



The quest i<|n may pf course be asked as to what the parameter.^ 
in the nodel medh anri wnat reality this model may present. A 
very concrete Example which illustrates this is the Flo^f^inc ^ 
Wall test invented by I.umsden ( 1976 ) a? a tool, for thought 
experiments in test theory and as a "test for test theorists" 
(p- 251)- 

Along a wall at intervals there are k flexible canes attached 
at various heights. The canes flog slowly and independently 
up and down* In takinc the test the examinee is placed on a^ 
cart-whicK is drawn quickly alone near the wall and the 
^examinee^'s score, to be used as a measure of height, is th^ 
^ ' number of canes which touch him (see Figure !•?)• 




/ 



with on^ assumption, namely that the canes flog with the same 
amplitudes, this test would fit the Rasch model, with he 
height of •the examihee as the ability parameter and the 
heights of the canes on the wall as the item difficulties 
0^.. As will be shown later 'the Rasch model furthermore 
implies t3*iat the examinee scoi^es and the cane scores (i,e, 
the number of examinees which a cane touch) can be used to 
obtain separate estimates of the parameters. 

The two- and three-parameter models 

In the twc--parameter model (or^the Birnbaum model as it is 
sometimes called) another parameter (a^^, i = l, • • • ,k) / the 
discrimination parameter, is intrdduced which allows the 
TCC^s for di^erent items to have different slapes^The 
ICC for an item in this model can be v;ritten: 

expa.(C-o. ) 

(1.1.3) ^ fi(0= 

, l+expa^(^-a^) 

Two ICC^s for the two-parameter model are shown in Figure 1,3 
For the item with a high discrimination parameter the slope 
is steep, while it is much more shallow for the item with 
the low parameter. 

We can use the Flogging Wall test to illustrate the meaning 
of the discrimination "^parameter too. This parameter would 
reflect differences in amplitude of the flogging of the 
canes, i.e. with this model it would no longer be necessary 
to assume that all the canes have the same amplitude. But it 
should also be pointed out that with this model we should no 
longer use the number of cane% touching the examinee as an 
estimator of his height (ability), but instead weight the 
score on. each item with its discrimination parameter. 



4 



13 




Figure K3. Item characteristic curves for two items (a^=-l ,b^=l ;ap5,a2=l .5) 
in the two-parameter logistic model. 



Let^s return to Figure 1.3 for^ a moment. Inspection of this 
figure (observe that only a part of the ability continuum is 
shown) shows that for low scores or^ the ability continuum 
the probability of a correct answer asymptotically approaches 
0, This obviously implies that this model, as little as the 
one-parameter madel, can be expected to properly represent 
the case when the items allow guessing. 

A third model, the three-parameter model has been proposed 
in which another parameter (iT^,i = l, . . . ,k) is- introduced to 
prevent the lower asymptote of the ICC to approach zero. The 
ICC for an item in this model can be written: 



expa.(r,-a. ) 
(1.1.4) fi(0=^i*(l-i^i)- 



l>expa£( ^-a^) 



ERIC 



14 



Iwection of Figure l*^, where two ICC's for the three- 
parameter^ model are depicted, reveals that the curves approacn 
the^ value of ir. as the lower asymptote (compare the eraohs 
for item 2 in Figure 1.3 and 1.^). Since the lower asymptote 
can be taken as the probability of obtaining a correct 
answ^l?^ obtained by guessing, the parameter tr^ is often re- 
ferred to as the guessing parameter. In can be noted, how- 
ever, that the estimates of the parameter typically come ou^ 
lower than the values that would result if examinees of lov/ 
ability were to guess randomly. For this reason, which Lord 
(197^a) has attributed to there often being "too attractive" 
distractors, it has been argued against labelling this para-^ 
meter "guessing parameter", and instead considering it as the 
limit of the lower asymptote of the ICC. 




Figure 1.4, Item characteristic curves for two items 

(a^=-l ,02=1 ;ap.5,a2=l .5,7T^=Ti2=.25) in the three-parameter 
logistic model . 



ERLC 



1,2 Assumptions underlying the LT models 



All applications of LT models imply that in one step or 
another parameters included in the particular model chosen 
are estimated from the responses of a group of persons to a 
set of items. These parameters have a number of desirable 
properties and when they are at hand a number of problems 
can be solved which would even be difficult to formulate 
under the classical approach to test theory (see chapter 5). 

However, there are a number of assumptions that ♦must be ful'- 
filled in order for any reasonable estimates of parameters 
to be achieved, and any sensible application to be made. The 
three most important assumptions are thos^ pertaining to 
the dimensionality of the latent space > the principle of 
local statistical independence and the form of the item 
characteristic curvQ , 

Unidimensionality 

The three latent trait models spelled out above, and several 
others, are all based on the assumption that there is only 
one ability underlying examinee performance. The meaning of 
this assumption can be explained as follows (Hambleton et al 
1977): Suppose that a teat of k items is to be used in r 
subpopulations of examinees (an example for r=2 is one group 
of boys and one group of girls). For any particular given 
ability level the conditional distributions of - test scores 
must then be identical if the test is unidimensional. If, 
however, the conditional distributions vary between the sub- 
grbups this can only be because the test is measuring some- 
thing more tnan a single ability. * 

With respect to certain tests in common use, the assumption 
of unidimensionality is certainly untenable. It can, how- 
ever, be claimed that a test should be unidimensional since 
the resulting scores are otherwise more ur less meaningless 
(Lumsden, 1961, 1976). McNemar (29^0, also quoted in 
Lumsden, 1976) expressed this in the following way: 

1^ 16 



"Measurement implies that one characteristic at a time 
is being quantified. The scores on an attitude scale 
are mdst meaningful when it is known that only one 
continuum is involved. Only then can it be claimed 
that two individuals with the same sccre or rank can 
be quantitatively and, within limits, qualitatively 
.similar in their attitude towards a given issue." 
(P- 268). 

The same line of reasoning certainly also applies in the 
measuren.ent of abilities. 

It can be asked how one can make sure that the items intended 
to constitute -a test are unidimensional. Factor analysis of 
the items is a method that has been used to investigate the 
number of dimensions involved in taking a test. Lumsden 
{1961, 1976), specifically, has argued in favor of this 
method *when attempts are made to construct unidimensional 
t^sts and several authors have reported applications of 
factor analysis to assure unidimensionalitv before pro- 
ceeding with an LT noriel. 

The method is, however, not without its problems. One prob- 
lem pertains to the choice of measure of association between 
the items. The phi-coefficient is commonly used but this 
measure has the. unfortunate characteristic that there are 
limits on the numerical values it can attain, witfr the limit 
varying as a function of the marginal frequencies of the 
items* A consequence of this may be that even a strictly un- 
dimensional test may appear as multidimensional in the factor 
analysis (Ferguson, 19^1). 

The tetrachoric correlation is another measure of association 
that has been used in factor analyses at the item level and 
which is not limited as to the values it can attain. However, 
matrices of tetrachoric correlations are often not positive 
definite with breakdowns of the analyses as a common con- 
sequence. 



' 17 



Another problem when factor analysis is used to investigate 
the dimensionality of a set of items is that unless there are 
differences in the levels of abilities among the examinees 
the ratio of the first to the second principal component of 
the matrix of inter-item correlations will not be large, as 
is dictated by the assumption of unidimensionality . Since LT 
models can fruitfully applied even in the case when all the 
examinees have the same ability this restriction in the app- 
licability of factor analysis is unfortunate. 

Even though the problems mentioned above do not wholly in- 
validate the use of factor analysis before LT models are 
applied, it cannot be allowed to give the final verdict. The 
problem is not very serious, however, since the assumption 
of unidimensionality, along with the other assumptions, can 
• be tested' with the LT models themselves through goodness of 
fit tests. 

Local statistical independence 

The assumption of local independence implies that the answer 
of an examinee on one item must not influence his answer on 
another item. For any two items, i and j, this can be given 
the following statistical formulation: 

(1.2.1) P(A.=1 and A. = lU)- P( A • = 1^ ) P( A. =1 1 T, ) 

That is, for a given ability level the probability of getting 
two given items correct must be equal to the product of the ' 
probablities of getting each one of them correct. 

Hambleton et al. (1977) pointed out that the assumption of 
local statistical independence for the case when the ability 
continuum is unidimensional is equivalent to the assumption 
of unidimensionality. They argued that, for a fixed ability 
level, if the responses are not statistically independent, 
some examinees have higher expected scores than others. Con- 
sequently more than one ability would be necessary to account 

18 



for test performance. 



As a consequence of the equivalence of the two assumptions, 
what was said above about the testing of the assumption of 
unidimensionality applies to the testing of the assumption 
of local statistical "independence as well. But it must of 
course be realized that the k.»nd of action to be taken differs, 
depending upon which assumption has been violated. 

iff 

The form of the item characteristic curve y* 

All LT models have in common that a choice must be made as 
to what kind of ICC to operate with. If this was not done, 
it would be impossible to formulate the statistical models 
out of which the equations for estimation of parameters can 
be determined. 

Of course the function relating the probability of a correct 
answer to an item to ability can take any form, it need not 
even be continuous (cf "latent class analysis^, Lazarsfeld 
& Henry, 1968). Thus it is always necessary to test the 
particular assumption made, which can usually be done through* 
applications of goodness of fit tests. 

The three logistic models spelled out above differ with res- 
pect to the constraints put on the form of the characteristic 
curve, with the one-parameter model imposing the strongest 
assumptions and the three-parameter model imposing the least 
strong constraints. It is of course an empirical question 
whether, for a given set of data, a less constrained model 
is necessary or whether a more severely constrained one will 
do. But partly it is also a question of res?*arch strategy in 
that it is sometimes possible to select from a larger pool 
,of items those that conform to the requirements of the more 
constrained model. 



19 



/ 



1»3 The Raach model versus the other models 



The different LT models all have their strengths and weaknes- 
ses and they are not all equally applicable to all types of 
problems. The most important differences seem to reside, how- 
ever, between the Rasch model on the one hand and the two- 
and three-parameter models on the other, 

r 

The most important drawback of the Rasch model is that it is 
built on such strong assumptions that it could be argued that 
the opportunities for using this model are small. It has, how- 
ever, been shown that it is by no means an impossible task to 
find existing tests that do fit the model (e.g. Rasch, I960) anr^ 
tests can of course be specifically constructed to conform 
to the requirements of the model. The reason that this might 
be a preferable strategy is that the Rasch model in many 
Tespects has decisive advantages over the other models. These 
, advantages are discussed below. 



Interpret ability 

The size of the item and person parameters can in the Rasch 
model be- given simple interpretations in terms of odds of 
success on an item. The probability of success on item i for 
person v ( for simplicity this probability will be called P^^) 
is: 



exp(E -o. ) 

(1.3.1) P - 



l^exp(?v-a. ) 



Evidently the odds of success can be written: 



P . 

(1.3.2) — ^= ^v4^exp(? -o. ) 
1-P . 

VI 



20 



If we now relate the odds of success for person v to the odds*, 
of success for person u on the sarpe item this can be written: 



(1.3.3) ^i. exp(g^-o.) 

which can he simplified into: 



(1-3.^) Ivi^ exp(VC,) 



We thus see that when two persons are compared this does not 
involve the item parameter at all and it can easily be shown 
that when two items are compared, the comparison does not in- 
volve the abilities of the persons. These possibilities for 
comparing persons independently of items, and items indepen^ 
dently of persons form the core oi" Rasch'^s theory of "specific 
objectivity" (Rasch I960, 1961, 1966) and it can be shov/n that 
the one-parameter logistic model is necessary and sufficient 
to obtain this kind of obejctivity. 

Behavioral scientists are probably more conversant with the 
additive linear model upon which the analysis of variance an d^ 

lated models aJ'e built than with the exponent ial~ family of 
mode^3>^ Framed in the lang^iage of the linear additive model, 
however, it-.can be said that the Rasch model is a model that 
does not allow far any interactions, i.e. the difficulty of 
an item must not be qualified by the conditions under which 
it is taken or by which person takes it. On the other hand 
it is of course quite difficult to imagine items the diffi^ 
cultie? o>f which are immanent to such a degree that they 
will neveX be qualified by any factor. The boundary condi- 
tions for a ^et of items to conform to the modol should thus 
be sought • whibh in a sense is done each time the model is 
dipplied and a goodness of fit t(?st is computed. 



21 



Equation (1,3*'*) above not only says that persons can be com- 
pared independent of items but can also be used to compute / 
the relabive odds of success on any item for any two persons. 
When the persons have the same ability the relative odds are 
1 since exp(0)=l. If> to take another example, person v has 
the person parameter 2.0 and person u the parameter -1.0 the 
relative odds of success on any item in favour of person v 
are 20 since exp( 3)='20. 1. 

The same type' of calculations can of course also be applied 
in -tlie -Comparison on items. Only the Rasch model allows this 
kind of simple probability statements and simple comparisons 
between items and persons. 

Estimation of parameters * 

All the LT models have separable parameters which can, at 
least in principle, be estimated on scales that are indepen- 
dent of the particular sample of examinees studied. The 
theoretical and practical problems connected with the esti- 
mation of -parameters have, however, been adequately solved 
only for the Rasch model. 

The common approach that has been used to derive the equations 
for the estimation of paramej;er is the maximum likelihood 
(ML) method. However, a straightforward ML approach, resulting 
in equations in which the item parameters and the person para- 
meters are estimated simultaneously, yields estimates which ' 
are not consistent, as has been shown by Andersson (1973a) 
and Martin-L8f (1973) (see chapter 2 below for further de- 
tails). 

This problem arises when structural parameters (the item pa- 
rameters) are estimated in the presence of incidental para- 
meters (the person parameters). Increasing the sample size 
obviously does not solve the problem since each new person 
brings a new incidental parameter. But it has been , shown 
(Anderson, 1973) that if the likelihood equation can be for- 
mulated only in the item parameters, then consistency and 



22 



unbiasedness is assured, which can be done if there exists a 
minimal sufficient statistic for the person parameters • In 
the Rascflf model, and only in the Rasch model, total score can 
.be shoim to be such an estimator of ability. Thus it is 
possi/le to formulate ML estimators for the item parameters 
througjj/fjonditioning on the total score in the Rasch model 
but not in the other models. 

In spite of the fact that the conditional maximum likelihood 
(QML) approach is the correct one for estimating the item pa- 
rameters in the Rasch model, the unconditional (UML) approach 
in which item parameters and person parameters" are^sTxm^"^" 
simultaneously is the one that has most commonly been used. 
(Wright & Panchapakesan, 1969 i Wright & Mead, 1977i Wright ft 
Douglas, 1977). The reason for this is that the CML method 
is computationally cumbersome and that numerical problems?* , 
have prevented its use on tests with more than 20-<4O items. 
The computer program presented in chapter 7 b^low does, 
however, present a remedy since it>^ can be used- for C.ML esti- 
mation of parameters for larger sets of items. 

It can be pointed out parenthetically that ^there is some 
confusion concerning the use of the^terms condif-^o^^sil 
unconditional estimates in LT models. Unfortunately Bock and 
Lieberman (1970) used these terms in a rather peculiar sense 
deviating from common use in mathematical statistics. By im- 
posing assumptions about the distribution of person para- 
meters they were able to state the estimating equations for 
the item. parameters in the two-parameter model without in- 
troducing the person parameters. These estimates were termed 
unconditional estimates while they used the term conditional 
^estimates for those resulting when both sets of parameters 
are estimated simultaneously. The term? conditional arfd un- 
conditional thus in a -sense carry the opposite meaning in 
the usage of Bock and Lieberman as compared to the usace 
above in connection with the Rasch model. In the sequel of 
this paper the latter meaning of the terms will be implied. 

Summarizing the discussion so far it can be concluded that 
only for the Rasch model are there solutions to the problem 



23 



of estimating tMe parameters which are theoretically comple-^ 
tely satisfying. (To be fair, however, it must be pointed out 
that this is true only with respect to the estimation of the 
item parameters; the unbiased estimation of abilities is still 
a problem to be solved). But aside from these theoretical 
questions there are also important differences between the 
Rasch model and the other models with respect to the amount 
of practical problems met with in estimating the oarameters. 

Since solutions to the likelihood equations are not available 
in closed formi numerical methods must be resorted to. How- * 
.ever, for the two-parameter- model the iterative approach 
employed does not converge properly unless both the number of 
examinees and the number of items is large (at le&st 1000- 
3900 persons :and 30-60 items seen to be required). The amount 
of computer time ^required is also very great. Hambleton et al. 
(1977, p. 107) report, for example, that for a test with 60 
items given to 5305 examinees ^0-60 minutes was required f<Jr 
convergence on an IBM 36O/65. Practical problems aloYie thus 
make application of the two- and three-parameter models out of 
the reach for many - researchers and for many problems. 

For the Rasch model, however, the iterative procedure almost 
never faila and at least for well conditioned problems where 
the number of items is not veVy large (less than 'dO to 80 
items, say) more than a few minutes on an IBM .36O/65 is sel- 
dom required, even when the CML estijnates are computed. 

Test^ing assumptions 

Goodness of fit tests exist for all_.the different LT models 
(Rasch, i960; Wright & Panchapakesan, 1969; Andersson, 1973bi 
Bock, 1972; Martin-LOf, 1973; Mead, 1976b). The tests gene- 
rally are of the chi-square or likelihood ratio type and at 
least for some of the proposed test statistics it hap been 
shown that they assume the specified distribution at least 

asymtotically (Martin-L3f, 1973; Anderson, 1973b). 
> 

More important , perhaps, than the statistical prooorties of th 
proposed tests are the difference between the different LT 



24 



/ 



models with respect to the possibilities of detecting: impor- 
tant deviations from the assumptions. 

It appears that the Rasch model is *'safer" ^n this respect 
twain the other two models. Mead ( 1976a) discussed factors 
such as guessing, carelessness,, speeri, practice and item 
bias as threats to the fit of data to the Rasch model. He 
concluded by saying: ^ 

"All of the disturbances considered represent som^ 
form of multidimensionality ; they would violate any 
model that assumes unidimensionality . Since the effect 
of ti)e disturbances often aopears as a change in the 
slope of the item characteristic curve, any model 
which includes item discrimination as a parameter 
would appear to fit the data." ('''e ad, 1976a, p. 11) 

There is thus a risk in using the lefts constrained models 
since thre^ats to the important assumption of unidimensionali 
ty can be *'taken care of" as varying item discrimination. 

It is of course true that the importance of testing a more 
constrained model with powerful means is extremely important 
since otherwise all claims for sup.^riority are^ invalidated. 
Fortunately there do exist sound statistical tests for the 
goodness of fit tests of the Rasch model, at least when the 
CML approach is used (see chapter 3 below). 



Conclusion 

In the comparisons made between the Rasch model pn the one 
hand and the two- and three-parametej; models on the other 
with respeo^; to interpretability , estimatioa of parameters 
and testing assumptions, the Rasch modol shows up more 
favorably in every respect. If it can empirically be shown 
that it is possible to make educational and psychological 
measurements which conform to the requirements of the model 
it will find a number of different uses. Fome of the possibl 
applications Will be discussed in chapter 5 t^ltev a more 

25 




7 

detailed presentati9h of the mathematics of the model a^d 
procedures for testing goodness of fit has been made. 



ErJc ' 26 



. Chapter 2 
THE MATHEMATICS OP THE RASCH MODEL 



In this chapter the structure of the Rasch model will be 
more formally exposed and it will be shown how the parameters 
in the model can be estimated. But the presentation also 
serves as a documentation of the computer program (calle<^ 
PML) presented in' chapter 7; the solution of some numerical " 
prob^lems are presented in detail and operating characte- 
ristics of the program are presented. 



2.1 Estimating item parameters 

In developing the mathematics of the one-parameter model we 
^will make use of a somewhat different notation from that 
used hitherto. The derivation is at points greatly simpli- 
fied If an ant i logarithmic transformation is made of the 
parameters such that 9^=exp(C^) and G^=exp(-a^). The probabv 
lity of success for person v on item 4 can then be written: 



- ' ^ ®v^i 

(2.1-1) P(A^i=l|e^,e.)= 



The usual testing situation' is one in which n examinees havp 
been given k items. As previously we assume that the res- 
ponse variable is of the Bernoulli type, so that in keeping 
with the previous notation 



A., 



1 if person v is succesful on item i 



vi" |0 if person v is not successful on item i 



Then we can write (2.1.1) 



(2.1.2) . 



27 



The observed data can be assembled in the matrix ((a^^)) 
shown below: 



Items 



1 


examinee s 
• • • V • • • 


n 




**• ^irl *<** 




• 


• 


« 
• 


• 

^1 

• 


• 

^vi • * • 

* 

• 


• 

a; 
m 

• 
• 


• 


• 


• 


^1 


r^ 


r 

n 



The raw score for person r is thus; 



• = Z a . 

V VI 



i = l 



and the total number of correct responses to item i (the 
item score) is: 



n 

v=l 



Those persons who have 0 or k correct answers must be excluded 
from the ((^y^)) matrix since no estimates of their parameters 
are possible to obtain. Also items with 0 or n correct answers 
must be excluded from the matrix for the same reason. 

Under the assumption of statistical independence the likeli- 
hood of the data matrix ((a^^)) is the product of the pro-' ^ 
babilities of all the answers: 




n 



(2.1.3) 



nn 



.)"v:K J[b;Jv Ilci'i 



v=l i = l 



V 1 



Looking at the likelihood function we find that only the 
marginal sums of ((a^^)). are'represented and not the ''inner" 
of the matrix. Thus we need not take into account which items 
a certain examinee has answered correctly or which examinees 
answered a certain item correctly. In other words, we find ^ 
that raw score is a sufficient estimator for the person para-- 
meters and that item score is a sufficient estimator for the 
item parameters. 

The likelihood function A can be maximized in the usual way 
with respect to the parameters to yield ML estimates of the 
(e^) and (e^) (Wright & Panchapakesan, 1969 j Martin-LSf, 
1973; Fisher, 197^ p 257 ff; Wright ft Douglas, 1977). Written 
in a simple form, although not very suitable for computations, 
the estimating equations are: 



n 



v=l 1+e e. 

V- Z ^ 



There is one more parameter to be estimated than there are 

equations, a problem that can be solved throuph usinp some 

kind of normation. One possibility puttinp the parameter 

value of one item to unity and anoth'^r possitxility is usinf 

the product normation nG^=l. The system of equations can only 

i 

be solved iteratively but there do exist efficient computer 
programs for this purpose (Wright & Panchapakesan, 1969; 
Wright ft Mead, 1977). 

ERIC 29 



The approach sketched above is the unconditional maximum 
likelihood approach that was mentioned above on-^ page 
In that context it was also pointed out that the UML method 
produces estimates which are not - consistent . 

That ML estimators in certain situations fail to be consi^^- 
tent was first "discovered by Neyman & Scott (19^8). One class 
of situations in which this occurrs is the one in which the 
model contains incidental (or nuisance) parameters beside 
those structural parameters which are to be estimated. The 
most commonly known example of such a situation is the esti- 
mate of a population variance for a norma] distribution 
(see Andersen, 1973a, p. 1^1 ff; Martin-I,8f, 1973, P* 76). It 
is. known that the deviation sum of squares is to be divided 
with n-1 to give the unbiased estimate. The WL estimator, 
however, can be . shown to make, use of n as the, denominator 
and this estimate is thus biased. This occurs because the 
population mean must be estimated from the sample data; 
each new sample will thus give a new value on this (inciden- 
ta,]) parameter . 

In the Rasch model the person parameters are incidental para- 
meters when we want to estimate the item parameters (and the 
item parameters are incidental parameters when we want to 
estimate the person parameters) and of course the number of 
person paran^ters does not stabilize when we increase the 
sample size since each new person brings a new parameter 
(this fact must not be confused with the fact that since the 
model is discrete we can only get a limited number of esti- 
mates of all the possible person parameters). 

But it has been shown (Andersen, 1973a) that if the likeli- 
hood equation can be formulated only in the item parameters 
consistency is assured. This can t>e done if their exists 
a minimal sufficient statistic for the person parameters and 
in the Rasch <nodel, and only in the Rasnh model, raw score 
is such an estimator of ability. Thus it is possible to for- 
mulate ML estimators for the item parameters through condi- 
tioning on raw score. The details of this are presented below 
but first we shall discuss another approach to come to gripr, 



30 



with the inconsistency of the UML-esti mates'. 

o 

This approach consists in seeking corrections to rectify the 
UML estimates, in a similar vein with the way in which the 
ML estimate for the population variance is corrected fpr 
with the factor n/n-1. Simulation studies carried out within 
the range of 2 to ^0 items (Wright & Panchapakesan, 1969j ^ 
Fischer & Scheiblechner, 1970;, Wright & Douglas, 1977) have"*-, 
indicated that for item parameter on the log scale a correc- 
tion factor of (k-D/k is suitable, and when this correction 
is applied the difference between the TML and C^L estimates 
is generally not greater than one unit in the second decima] 
place. 

Since the CML j^stimates are quite cumbersome computationally 
it could- be argued that the corrected UHL estimates would do 
for all practical purposes. There are, however, three reasons 
for which the UML estimates should be discarded, in spite of 
this, in favor of the CML estimates. The first, reason is that 
the particular correction factor employed is empirically 
rather than theoretically derived and its validity hinges 
entirely on the range of situations studied in the sim^La- 
'tions. r^y own impression is that the corrfertion used works 
quite well in situations were the variance of person para- 
meters is not to large but that it tends to berome poorer 
when this variance increases. (The-se observations were made 
when data were generated under the two-parameter model with 
a high but for all items' common discrimination parameter, 
and then analyzed under both the UML and CML approaches .f The 
Rasch model of course does not assume that the discrimination 
parameter for all items is exactly unity; all that is assumed 
is that all the items have the same discrimination parameter. 
Varying item discriminations among sets of items is taken into 
account as a simultaneous transformation of the scales of item 
and person parameters and a high discrimination shows as a 
high variance of the person parameters.) 

The second reason is that no correction has as yet been 
found for the bias in the person parameters. In virtually all 
'the computer programs for UML estimation the person parameters 



31 



obtained in the simultaneous estimation of parameters are 
presented; somewhat better results would be obtained if the 
person parameters instead were estimated from the corrected 
item parameters. ( In fact, since the problem of a strictly 
conditional estimation of person parameters is not solved 
yet, this would in many cases place the UML- and CML-approaches 
on an equal basis). Even without presenting any exact figures 
it can be claimed that the bia's in the UML estimated person 
parameters is rather severe. It can be observed that when 
data are generated in accordance with the Rasch modei with a 
standard deviation (s) of, say, unity and the s of the Xsti- 
mated person parameters is computed, a rather similar value *' 
is observed. But in fact the observed petrson parameters should 
have a higher s since another variance component (correspon- ' 
ding to the standard error of measurement) has been introduced 
in generating the data. 

The third reason in favor of the CML approach concerns the* 
possibilities of testing goodness of fit: under the UML 
approach only approximate techniques have been proposed 
(Wright & Panchapakesan, 1969; Mead, 1976b) while under the 
CML approach there are test statistics which have an at least 
asymptotically known distribution (see chapter 3). 

The most important reason for not employing the CML approach 
has been that numerical problems have prevented its use with 
more than a limited number of items. It is, however, shown be- 
low that these problems can be solved. 

In developing the conditional approach let us first for 
simplicity consider a given examinee with the raw score r^, 
corresponding to the person parameter 6^. The probability 
of obtaining any score vector (a^^) given the person para-- 
meter and the vector of item parameters is: 

i=i i^Vi 



32 



To be. able to express this "probability as a conditional pro- 
bability given score r^ we must know the probability of ob- 
taining score r^ given e^. This latter probability is given 
by the sum of the probabilities of all possi^^le ways of ob- 
taining the score r^, that is the sum of all the expressions 
like (2.1.5) in which the vector (a^^) sums to r. 

A given score r obtained on k items can of course be obtained 
in (^j different ways. We' will need a special notation to be 
able to express this in a simple way. Define: 



k ^ 

(2.1.6) Y.{(e.-)}= L n e.^^ 

ra^.=r i=l ^ 

Pi 

In the expansion of this sum of products the summati^^n is 
made over those (r) combinations in which i:a-.-=r. '^he y {(eO) 

(or, for short, y^) is called the elementary symmetric 
function of order r in the parameters j^ ^). (On the following 
pages a more concrete presentation of these symmetric 
functions will be made). 

We "can now writ^ the probability of obtaining the score r 
given e ' and (e . ) : 

^ a * v« 



(2.-1.7) Pir|e^,(c.)}= J] J[ 



Ea„,=r i = l l*Vi 



1 



The conditional probability of obtaining the vector (a •) 
with the total score r^, given the score r^ is thus given by 
equation (2.1.5) through equation (2.1.7): 

k 



a. 



n e/^i 



.)|e^,(e.)} /A 

(2.1.8) - p{(a^.)|r,(e.)}= vi v i _ i-x 



P(r!e^,(e.)} Y 



r 



33 



We thus see tha-fcUiluLsJ confiitional probability is not a func-- 
tion of 6^; only the item parameters appear in the expression 



Since the examineeis ar»e assumes! to be independent we obtain 
the conditio 
persons as: 



the conditional likelihood of the data matrix ((a^^)) for n 



(2.1.9) A= J[ ^ 



v=l V. 



If we use n^ to denote the number of persons with raw scor^e r 

(r=l,,..k*l) and recall that s. is the score of item i 

(8=1, ...n*l) we can simplify (2.1.9) into: ^ 

(2.1.10) A= ili = ^ 



n k-1 rij, 

n n V 

v=l r=l 



From this conditional likelihood function the CML-estimators 
can be derived. If we ^irst take the logarithm of both sides 
we get : 

k k-1 

(2.1.11) logA= Z s.loge.- Z n^, logy 

i=l r=l 



We differentiate with respect to all the e. and set the de- 
rivatives equal to zero:' 

k-1 (i) 
filogA s. ^ y 

(2.1.12) = - Zj -^—^ (i = l,...,k) 

^^i r=] 

in which equation the symbol ^ is used for the partial de- 
rivative of Yj. with respect to e.. This derivative is fi 
synunetric function of order r-1 in all parameters excppt t^. 
This is most easily seen in an example. Suppose that 
ksii and that we are studying Yo« 

34 



In explicit notation the symmetric function can be written: 

■< 

. ■ • - 

and 

• (1) 

^ — " ^2*^3*^14" Yi^£2*^3*^iJ^" ^2 ^^1»^2*^3*^'<^ 

From (2.1.12) it is seen that we end up with a set of nonlinear 
equations in the (e^^) (for simplicity we will not distinguish 
between the parameters (c^) and the estimates (e) of the para- 
meters). 

(2.1.13) .s.= 2j ' (i=l,....k) 

r=l 

Prom the fact that Es^= Sn^,, it follows that we must impose 
some constraint on the system of equations to be able to 
solve it. The same normations as those mentioned above (p 22) ' 
in connection with the? UML approach are of course available 
and we can set e_=l,(It is practical to select m as the item 
^with medium difficulty. This is done automatically in the 
PML program, but there is also the optiorl to select any item,) 

r 

Eve*i after normation it is not possible to find an explicit 

solution to the system of equations but there exist numerical 

methods (Andersen, 1972; Martin-Lttf,' 1973; Fischer, 197^) 

|[hich can be used. In the application of these iteratj^ve 

methods there are two important problems to be solved^ the 

first pertains to the computation of the symmetric functions 

♦ 

Yp, and the second to how a rapid convergence of the sequence 
of iterations can be obtained. 



^he computation of the symmetric functions 

T]ti4 symmetric, function of order . r consists of a sum of (p) 



products, each of which consists of r terms. For example. 



35 



when ks50 and rs25 the symmetric function is a sum of about 
1.^6%XQ^^ terms, each of which is a t>roduct of 25 terms, 
pbviously it is impossible to compute the and the deriva- 
tives through a orocess of straightforward multiplication 
ipird: summation. 



^, Fortunately there do exist recursive formulas which make a 
* relatively rapid computation of the symiretric functions 
^possible (Fischer, 197^$ P 2^2 ff; Andersen, 1972). We can 
write: 



(2.1.1JI) Y^- €.Y 

r 1 r-1 r 



( i ) ' 

This is true since y is the sum of all products of r para 

r (i ) 

meters that do not contain e., and g-y is the sum of all 

1' i r-l 

products of r parameters that contain e^. An example should 
clarify this. Suppose that k=i| and r=2. Then we want to get: 



(2.1.15) Y2= ^i^2^^1^3^^1^^^^2^l^^2^k^^'5^^ 



If we take the partial derivative of Y2 with respect to 
we get : 



(2.1.16) Y^^^= e^+e-T+e 



2-1^" -2""3^^ 



and 



(2.1.17) ^I'^i^^" e -,^£2*^ 1^3^^ 1^14 



In the same way we can oasily convince ourselves that the 
partial derivative of y-j with respect to is: 



36 



(2.1.18) ^2^^" ^2^1*^2^^*^^^^ 



If we now compare the sum of (2.1.18.) and (2.1.17) with 
(2.1.15) we find them equivalent. 

Another recursive relationship of great use is the following 



(2..1.19) rY= E 



This formula cfitn be derived from (2.1,]i4) but again we use 
the example to convince ourselves. We get: 

^1^1 = ^1^2*^1^3"^l^i* 



(2.1.20) 



(2) 

^2^1 = ^^2*^2^3'^2^il 
"3^^^= ^1^3*"2^3*"3'^ 



We thus see that in this set of equations the six product 
terms in (2.1.15) each appear two times. 

From the two recursive formulas (2.1.14) and (2.1.19) it ,is 
possible to devise a very efficient algorithm for the com- 
putation of the symmetric functions of all orders and all 
the derivatives. Starting from the fact that Yq^^=1 we get 

^l'^^i^O^^"^^i* '^^^^ ^® compute Y^^^^Yj^-c j^Yq^^ and all 
the other derivatives of the symmetric functions of order 

( i) ^ 
two. In the next step we get 2y2-T.t^y^ and ran then ob- 
tain all the derivatives of the functions of the third 
order, and so on. 



37 

N 



'The algorithm has been programmed by Fischer (197H, p. 
and this subroutine is used as one of the methods of compu- 
ting the syinmetric functions in the PMl program. The algo- 
rithm has the virtue of being very fast: only multipli- . 
cations, k additions, k^ subtractions and k divisions are 
performed. It has one serious drawback, however: When the 
number of items is large and/or there are great differences 
in the size of the item parameters the computations break 
down as a consequence of round-off errors. The problem is 
caused by the differences Yj,-eiY|.^J which, particularly for' 
the orders around k/2, involve very large numbers, resultinr 
in cancellation of terms. -These problems are reduced ifi as 
is done in the algorithm used, the recursive formulas are 
applied both from "below", starting with order one, and 
from "above", starting with order k and then meeting at 
about k/2 (which procedure also allows a test of computa- 
tional accuracy). However, even with this method there is. 
When k is large, a virtually inescapable . loss of accuracy 
when floating-point representation is used and even' attempts 
to use extended precision (REAL*l6 on IBM machines) have 
failed to appreciably increase the number o-f items that can 
be analyzed. 

The breakdown -of this algorithm (which will be referred to 
as the Difference algorithm) occurs in the range of 20-Uo 
items. Since k for many tests is within this range, use of 
this algorithir is accompanied by the frustrating ej^perience 
that sometimes the analysis breaks down, and that sometimes 
it does not, for example when i different sub-groups are 



studied. ^ 



\ 



Fortunately, it js possible to find a recursive formula for 
the computation of the symmetric functl^ais in which no use 
is made of subtraction. We can write (?'. 1.1^4) in a slightlv 
different way: 



(2.1.21) Y^(ei, ...,£,)= Y^(ei....,t,.^)>c,Y^.i(ei,...,e, 

(Fischer, 197**,' p. 250). 



38 



But this means that we can add one parameter at a time, so 
to speak*. If we start with then YjCcj^)^^^^ and Yq(Cj^) = 1. 

If we add one more parameter, ej, we have: 

V — .^--^ ^ 

Adding a third parameter we get : 

^ Yi(G^,G2,G3)=Yi(e^,G2)*^3Yo(t^,e2^"^l*^2*^3 

Yj(g^,G2iG3)=Y5(Gj^,G2)*^3Y2(^1i^2^'^1^2^3 

and so on. 

After we have added the k parameters we .have thus obtained 
the symmetric functions of all orders in the parameters. 
This algorithm too *has been programmed by Eischer (197^, 
p 5^^4) who uses it to compute the second partial derivativf»s 
of the symmetric functions, Vhich is done through setting as 
equal to zero the parameter vales for all combinations of 
iteijis two at a time. But it can of course also be used, 
with some slight alterations, to 'compute the Symmetric 
functions themselves, as well as the first derivatives. 

In order to obtain the symmetric functions and the derivatives 

the routine has to be called (k+1) times. Each call to the 

k( ir-l) k(k-l) 
routine makes use of ^ multiplications and — ^♦k-l 

additions so to obtain the. needed information roughly 

*^^*^2 multiplications and ^ 2 ' additions are per^- 

formed. If the number of arithmetic iperations necessary for 
for this algorithm (which will be referr^^d to as the 
Summation algorithm') is compared with the number of opera- 



39 



tions used by the Difference algorithm it is found that the 
Summation algorithm is slower. It can also be seen that exe- 
cution time must increase rapidly as k gets larger. 

Nevertheless, the Summation algorithm is not unbea.'^ingly slow: 
A, complete iteration cycle, which involves computation of the 
symmetric functions of ail or/ders and all the first derivati- 
ves, requires for items about 1 second of CPIf time on the 
IBM 360/65, and for 60 items about M seconds is required. For 
a long test containing 100 items some 20 seconds vould be i*e* 
quired for each iteration. 

These estimates of computer time required are valid for the , 
case when the computations are carried out in double preci- 
sion. However, since the Summation algorithm is very accurate 
numerically there is in the PML program art option of "using 
single precision arithmetic in this algorithm. When t^ia option 
is used, somewhait less computer time (a reduction of some ).0 
per cent is a reasonable estimate) required. 

In addition to the fact that the amount of computer time re- 
quired may bec'ome prohibitive when very long tests (k>100, 
say) are analyzed there is one more problem that may appear. 
The problem is that the symmetric functions, and especially 
those of orders around k/? assume very large values and 
sooner or later the limit set by the size of the floating 
point numbers which can be represented in the particular 
computer used will be reached. This problem could be solved 
through scaling down the parameter values, but since the 
product normation is used in the PML program after the para- 
meters have been estimated this method is not immediately 
available in the program. ' • 

The amount of computer time required for each iteration is 
one factor affectinfe the cost of the* analysis . Another impor- 
tant factor is of course the number of iterations required. 
^How to obtain a rapid convergence is discussed next. 



■iO 



Th^ convergence of the iterations 

Several .different methods have been proposed for the solution 
of the system on non-linear equations (2.1.13). Andersen . 
(1972) suggested Fishery's Method of Scoring for the equations 
to be solved .in the polychotomous model, which has been pro-^ 
grammed by Allerup and Sorber (1977). This method requires 
only few iterations but on the other hand the computation of 
improvements makes usg of the second derivatives of the 
symmetric functions so each iteration cycle is very time con- 
suming. For example, a test with MO dichotomous items required 
about 3 minutes on the IBM 360/65 with this program. . 



Another' method, suggested by among others Martin-L6f ( 1973) 
and also presented by Fischer ( 197^ ) makes use of a simple 
switching between the right hand side and the left hand side 
of the equations (2.1.13). This is the method used in the PMI. 
program and it is presented in greater detail below. 

A first problem is how to choose start values for the itera- 
tions. One simple solution is to pi^t al] the (g^) equal to 
unity. Martin-L5f (1973) suggested that start values can be 
obtained through *an approximate solution to the equations 
(2.1.13) using a linearisation in the parameters. 



(2.1.22) 



s.-s 



k-1 

Z n -^^^"^ 



r=l 



k(k-l) 



(i = l.,. . . ,k) 



Both methods of select ing start values are available in the 
program. Sometimes use of the approximation effects a consi- 
derable saving of iterations in comparison with when unities 
are used as start values, ^sometimes the approximations has no 
appreciable effect. There should be no risks 'involved in usanc 
it, however, so it can be regularly applied. 

Irf each aew iteration cycle, t + ], new values for the parameters 
are computed from the previour> cycle t as: 



ERLC 



41 



9 



(2.1.23) c^'*'^ !i (i=i.: k) 



ERIC 



Ct+i) (t) 

When the absolute difference 'between .and £^ is less than 

a specified value (in the program it is tf^ken to be .001 but 
it can be changed at will) for al] items the iteration 
sequence is stopped. 

The number of iterations required to a very high degree de- 
pends upon the range of parameter values, but for the most 
part a rather large number of iterations is required (often 
not less than 100). It has, however, been noted by Fischer 
(197^ , p. 2^15; se also Fischer i Allerup,' 1968) th ^ the 
'sequence of improvements tends to form terms in a ge etric 
series and as soon as three iteration steps have been per- 
formed several iteration cycles can be saved through extra- 
polation. (In numerical analysis this extrapolation is known 
as the Aitken extrapolation, see Dah]quist & BjOrck, 197^, 
p. 235). 

If we call the estimated parameter values for item i from 

(t) (t+1) (t+2) 
three succesive iteration cycles , and " 

we get the extrapolated value as : 

, (t+2) (t + 1), 2 
00 (t+2) (^i ) 

(2.1.2^) c.=e. i i (i = l,..,k) 

^ ^ (t) (t+2) ^ (t+1) 

e. +c. -2e. 

Two new iteration cycles can then be performed, whereafter 
the extrapolation can be applied anew. 

This method ''is available in the program and generally it ef- 
fects a very considerable saving of iterations. However, it 
is reported by Fischer (197^ , P 2^i5) that the extrapolation 
may also cause the iterations to diverge. To prevent this 

42 



from happening two precautions also mentioned by Fischer have 
been taken. The first precaution is not to apply the extra- 
polation on the basis of the results in the first few itera- 
tion cycles and the second to set an upper lirit as to the 
amount of extrapolation. In no case have I observed that the 
Aitken extrapolation using these precautions should cause 
divergence, so it can probably be regularly applied. 

It is impossible to give any generally valid estimate of the 
^Ivumber of iterations required for convergence since this to 
some degree varies from problem to problem. It can be ob- 
served, however, that the range of item parameters is of 
critical importance — as soon as one or more of the para- 
meters assume high values ^ a larger number of iterations is 

required. For those problems in which the proportions of 
« 

correct answers on the items vary betweeti, say, .10 and .90 
convergence i's, however, generally obtained within B-20 ite- 
rations. Eor a test with ^0 items, th^ iterr parameters can 
thus often be estimatei in less than 20 seconds and for a 
test with 60 items in a minute or so. 



2.2 Estimating person parameters 

In estimating the person parameters we could in principle 
proceed in a siniilar way as when estimating the item para- 
meters, i.e. through conditioning on item score a conditionally, 
likelihood function expressed only in the person parameters 
can be developed. It can be shown that the equations to be 
solved can be written: 



(2,2.1) r 



V 



1^ (v) 
^ ^ ^v^s-l^^Qv^^ (v=l,...,n) 



(cf. Fischer, 197^, P. 2^0) 



Unfortunately it is an impossible task to compute the sym- 
metric-functions in the parameters so it is not possible 
to solve this system of equations. 



erIc 



43 



If. however; it is assumed that the number of persons is large 
in comparison with the number of items, we can treat the esti- 
mates Qf the item parameters as fixed and estimate the person 
parameters under this assumption. We then get the followin£ 
set of equations to solve: 

k 

^ ^ (r=],...,k-l) 

3=1 r 1 

We find that these equations are the same as those appearing 
in (2.1.^) for UHL estimates of the person parameters, except 
that here the subscript v has been changed to the subscript 
r, which is possible since all persons having the same raw 
score must get the same estimated ability. 

The equations can easily be solved iteratively using the 
Newton-Raphson method. In the PMI. program a routine presented 
by Fischer (197^^, p. 525) is used to do this using the item 
parameters resulting under the product normation ne:=l. 

Locking more. closely at (2.2.2) we find that in one special 
case the equations can be solved explicitly and this is when 
all items are of equal difficulty (i.e. all item parameters 
are 1). Then (2.2.2) reduces to 

ke 

(2.2.3) r= — . . (r = l,., .,k-l) 
so 

r 

(2.2.4) 9 = 

k~r 

When the range of item parameters is not too great (2.2.^) is 
used to compute start values for the iterations. This approxi- 
mation of course gets poorer the more the item parameters 
vary, so when the difference between the largest and the 
smallest item parameter on the log scale is greater than 2.0 



44 



another approximation presented by Wright ft Douglas (1975, p. 
22) is used to compute start values. This approximation is ha- 
. sed on an assumption of equally spaced item parameters: 



(l-exp-ii£)expw(^-^) 

(2.2.5) e^= _ 

l-exp-w(l-|) 

Where w= loge^^^-loge^. ^ 

From the presftitation above it is obvious that, with this 
method of estimating person parameters the only thing that in* 
fluences the estimates is the distribution of item parameters. 
The resulting estimates are known to be slightly b-iased but 
it should be pointed out that one advantage is gained: as 
soon as the item parameters are in hand, an ability scale 
corresponding to the different rav/ scores is easily construc- 
ted (see chapter 5 below). 



2,3 The information function and conf idenceintervals for the 
parameters 

It is possible to determine standard errors for the estimate^ 
of the parameters, v/hich are based on the information func- 
tion with respect to each parameter. The statistical informa- 
tion in the sarrple with respect to any parameter IT is defined 
as : 



(2.3.1) I(n;= E{(^^^) } 

5n 



where A is the likelihood function. 

It can easily be shown (see e.g. Fischer, 197't, p. 29'l ff) 
that the information of iteir. i v;;th r^>sppot to the. person 
parameter is:. 

exp({: -o.) 

(2.3.,2) l,UJ 



{l + exp(f;^-Oi)}^ 



4 



K 



The information of a test (I^) with respect to the person 
parame 
items : 



parameter is the sum of the information of each of the k 



(2-3.3) I^(^„)= 



Analogously we get the information in the sample -with respect 

to the item parameters {I (a-)) to: 

pi , . 



n 



(2.3.^) In(^i)=' L ? 



Confidence intervals for the item p a rameters 

In the theory of ML estimation it is known that the estimates 
are asymptotically normally distributed with the standard 
error ~. We can thus construct confidence intervals for the 

/i 

item parameters in the usual way: 



(2.3.5) h-'/VO'^'-'i'^'i^'^/VO'' 



Where is the critical value from the normal distribution. 
In most cases the asymptotic properties of these confidence 
intervals should be assured since n is usuallv larger 



Confidence intervals for the j^rson parameters 

At least when the nun.ber of items ir; larc^r than, say, 20-^0 
items, it is fossihle to determine usoful ronfidorice intor- 
vals for the person parameters; 



ERIC 



(2.3.6) 



Prom the fapt that the information function is a function of 
ability it is clear that the confidence intervals will be ^ ^ 
different forilifrerent person parafneters. Thus, in contrast 
with the classical psychometric theory, the LT models make no 
assumption about homoscedastic standard errors of measurement. 
Some details on how the functions for these standard errors 
look for different tests are presented in* chapter 5.1. 

It must be observed that the standard errors are normally 
distributed only when k is large, so only then can these 
confidence intervals be trusted. But we have already derived 
an expression (2.1.7) for the probability of observing a 
certain raw score, given a person parameter and the item 
parameters. .This expression can of course be used for a 
straightforward computation of the probabilities of obser- 
ving each different raw score (including 0 'and k) for each 
estimated person parameter. Such a matrix of probabilities 
is included in the output from the PML program for kS30. 

Some comments on how to interpret confidence intervals around 
person parameters might be in place. Lord and Novick (1968, 
p. 511-512) stressed that any confidence statement about 
which region a persons ability falls into can be made with 
the specified probability only for a randomly chosen person. 
We can in fact make no confidence statements "about a parti- 
cular, nonrandomly chosen examinee in whom we happen to be ^ 
interested. Nor can any confidence statements be made about 
those examinees who have some specified obs^erved score." 
(Lord & Novick, 1:^68, p. 512). 

It is a distressing fact that we can have no confidence in 
confidence statements relating to specified observed scores; 
for a particularly illuminating discussion of the problens 
involved the reader is referred to Cronbach, Gleser, Kanda 
and Rajaratnam (1972, p. 132-13^). 



47 



The index of subject separation 



In some cases -there is a need, when the Rasch model is applii^'d, 
to have a counterpart to the coefficient of reliability in the 
classical theory, i.e. a measure of the accuracy with which 
fhe relative positions of the subjects on the latent' trait 
can be discriminated. Such a measure has been introduced by 
Andrich and Douglas (1977). 

The traditional concept of reliability can be defined:\ 

a 

2 2 
where a^. is the variance of true scores and o is the variance 

of the errors of measurement. From the assumptions that the 

observed score can be written x^=?^+e and that true sooBes 

and errors are uncorrelated, it follows that we can also 

write the reliability: 



^2 ^2 

(2.3.8) r^^,= 

X 



The measure introduced by Andrich and Douglas (1977), called 
the index of subjp-*: separation (ISS), serves as a counter- 
part to the coeffic-Lent of reliability in those cases in ' 
which we can obtain direct estimates of the variance of the 
errors of measurement. 

They argued that even though the variance of the errors of 
measurement varies as a function of ability, the average of 

the estimated error variances, 5'^=2j — ^> he taken as n 

2 

reasonable estimate of in (2.^5.8) above, f^in^-e the 
variance of the estimated person parameters (i.e. the 
counterpart to a^) is easily computed we have estimates 



4S 



of all the quantities in (2.3.8) and can directly compute 
the ISS according to this formula. 

This measure tends to give estimates that are highly similar 
to estimates of the coefficient of reliability with KR 
(both measures are given in the i?ML program) but there 
are sometimes differences between them (when the sample is 
Beverly skewed^, for example, the ISS tends to be considerably 
lower than KRgQ)* 



20 



The ISS of course shares with the coefficient of realibility 
the characteristic of being sample specific but it appears 
that the ISS has a conceptual advantage* The coefficient of 
reliability can be low for two reasons; either because the 
items are heterogenous or because each of the lel^els of abi- 
lity is not measured with enough precision because too few 
items are used. If, however, the ISS is low for a test fit- 
tirfg the Rasch model we can rule out item heterogeneity as a* 
cause and instead concentrate on getting better estimates ,of 
each level of ability through adding more itens. 



49 



. Chapter 3 
TESTING GOODNESS OP PIT TO THE RASCH MODEL 



It has been -stressed above thas as a consequence of the 
rather strong assumptions underlying the Rasch mpdej it is 
very important that sound procedures for testing goodness of 
fit are applied. 

Several different procedures for tesi-ing goodness of fit to 
the Rasch model have been suggested. Here, some methods based 
on the CML approach for estimatic^n of item parameters are 
presented in detail; one graphic method for assessing item 
fit (Allerup i Sorber, 1977) and tv/o overall numerical tests 
(Andersen, 1973b; Martin-LOf, 1973). /There do exist other 
more primitive methods for assessing goodness of fit, and 
some of these are briefly mentioned first. 

Since the item parameters, if the model holds, should show 
no systematic differences if estimated from different sub- 
groups of the sample it is possible to plot such estimates 
against each other and look for systematic deflations (Fischer 
197^, p. 281 ff). Since the standard errors are estimated too 
it is also possible to test for each item the difference 
betv-een the estimates obtained in any two groups. (Fischer, 
197^, p. 297-298 and chapter 5.2 below). 

Another approach to testing model fit for the Rasch model 
has been developed by. Mead (1976a, 1976b) and Wright and Mead 
(1977). In this method, estimated item and person parameters 
are used to predict scores at the item level and ^rom the 
residuals between observed and predicted scores* chi-square- 
like tests of item fit, person fit and overall fit are deve- 
loped. However, these tests have unknown asymptotic distribu- 
tions and simulation studies (Mead, 3976b) indicate that even 
though the means of the distribution conform to the expected 
the variances may depart substantially. 



50 



Before the tests based on the conditional approach mentioned 
above are presented > it should be pointed out that a ^ound 
application of statistical tests for evaluating goodness of 
fit implies much mcff^than the choice of a test statistic 
with known properties. Any inferential method is strongly 
dependent upon the number of observations made: when the 
sample is too small even ^ross departures from the model will 
be accepted and when the sample is very large even the 
slightest deviation will cgiuse us to. reject the model. The 
first problem reduces down to one of making enough observa- 
tions to obtain a reasonable power in the test. Unfortunately 
the power characteristics of the overall tests are unknown ' 
but some sifniilation studies of this problem will be presented 
below. The problem that since no model ever holds perfectly 

, true all models will be rejected granted that enough observa- 
tions are collected has, however, been solved, Martin-L80f 
(197^a) nas introduced a measure call redundancy which on an ab- 
solute scale gi^es a measure of the degree to which the data 

\deviate from the model, which' gives a basis for accepting in 
some cases the model even though the test statistic yields a 
si^ificant value. This measure is described below in section 
3.3.\ 

There are several other questions relating to strategic apn- 
lications of goodness of fit tests, such as trading relation- 
ships between assumptions, item selection procedures, cross 
validation problems and so on. This type of problems will, 
however, be discussed at length in chapter ^. 

3* 1 Testing item fit 

- Before the^ overall tests of goodness of fit are presented, 
methods for evaluating goodness of fit at the item level will 
be considered. Under the CML approach tnere exists no statis- 
tical test that yields a p-value for the probability of fit 
of each item. Instead graphic methods are employed. The dis- 
advantage of th^ graphic methods is that they involve an 
inescapable element of judgement which, especially until 



51 



experience has been accumulated may be quite difficult. But 
the graphic methods have the important advantage that they 
' are not so strongly influenced as the inferential methods by 
the sample size: thus deviations not detected by a powerless 
statistical test may be possible to detect by a graphic method 
and a statistically significant departure from the model may 
be judged practically insignificant on the basis of a graphic 
test. 

In investigating fit we do not work with the ((a^^)) matrix 
introduced above on page 21 but reorganize it into the item 
by scoregroup frequency matrix of correct answers, ((n^^^)), 
in the following way: 







Score group 
1 . . . r 9 • • 


k-1 






1 


1 1 * * ' ir * * ' 
• • 


"l.k-1 

• 


• 


Item 




• • 

• • 

• • 


• 
• 

"i,k-l 

• 


• 
• 

• 




k 


• • 

• • 

"kl "^kr 


• 
• 

-"k.k-1 


• 
• 

^k 






... rrij, . . . 







It is obvious that: 
k-1 

(3.1.1) Z n. = s. 

and recalling that n^, is the number of persons with raw score 
r we see that : 



ERIC 



52 



(3.1.2) Z n. s rn^ 

i=l ^ 



The observed proportion of correct answers to item i within 
score group r is n^^^/n^. We can also compute the predicted 

proportion of corrects answers to item i for score group r* 
The conditional probability that a person with raw score r 
answers item i correctly is the number pf answers vectors in 
which item i is answered correctly divided by the total num- 
ber of possible answer vectors which add up to r, i*e.: 

(i) 

(?.1.3) P{A^. = llr,(e.)}= ^il^ 

Thus, if the model holds true for the data the relation 

(3.1.4) ' ^ 

Should hold for all score groups. If we, for a fixed item, 
plot the observed proportion against the predicted proportion 
the points should fall along a straight line with a slope of 
unity.. As a function of sampling error the points will of 
coarse be spread around the line with unit slope, thus syste- 
matic deviations from the predicted proportions along diffe- 
rent regions of the abscissa is what is to be looked for^ 

In the FML program this graphic test is produced as one prin- 
ter plot for each item with each plot requiring about 1 se- 
cond of CPU time on the IBM 360/65 . Each plot uses one page 
(or, to be more exact, 5^ lines)of printed output. 

Even though no statistical test yielding p-val^es for the fit 
of each item has as yet been found within the conditional 
approach it is possible to compute for each score group the 
pr*obability that an observed frequency of correct answers 
deviates from what would be expected on the basis of t'ne mo- 



53 



del. Under the^^.null hypothesis of . model fit the n^^^ should - 

be distributed binomially Bin^, ^«,-). A two sided test can 

•r ri, 

thus be performed such that when n • 'Sn^ff^- the probability of 

observing n^i^^ or fewer correct answers is computed and when 

'^ir^'^r^ri probability of obtaining n^^ or more correct, 

answers is computed, in both cases under the assumption that 
the null hypothesis holds true. 

These tests, too, are available in the PML program but it 
should be pointed out that the power of these tests is lower 
tharf the. "power" of the graphic test in the sense that syste- 
matic deviations from tha model which can be detected with., 
the graphic test are often not detected with the binomial 
test. 

A slightly different version of the binomial test has been 
.presented by Allerup and 3orber (1977) and for computing the 
cumulative binomial probability distribution a subroutine 
written be these authors is used. 

3.2 Overall tests of goodness of fit 

It has been shown by Rasch (i960) that it is possible to de- 
vise a test of the m6del which is completely free from esti- 
mated parameters. This test, which is a generalization of the 
Fisher exact test foj?' a 2x? matrix, is, however, so computa- 
tionally cumbersome that it is impossible to put it into 
practical use. 

Thus methods based on estimated item parameters have to be 
used. This, however, is no great sacrifice since it has been 
shown by Martin-L5f (1973> 197^b) that certain tests based on 
ML-estimates are parametric counterparts to generalisations 
of the Fischer exact test. 

There do exist two overall numerical tests of r^oodness of fit 
for the Rasch model which are both asynptotically chi-square 



54 



distributed. Oi|e is a conditional likelihood ratio test inde- 
pendently suggested by Martin-Ltif (1973) and Andersen (1973b). 
Since this test has come to \?e called the Andersen test the 
same label will be used here. The other test is a chi-sfluare 
test Computed from a quadratic form sugeested by Martin-L6f 
(1973). This test will be referred to as the Martin-LGf test,' 

The Andersen conditional likelihood ratio test 

Likelihood ratio tests are intimately associated with ML es- 
tination and stated verbally in simple terms the general 
principle of 3urh tests is to compare values of the HkeHhpori 
function result inp from parameters estimated under competinc 
hypotheses . 



The logarithm of the conditional likelihood function was de 
rived as formula (2.1.11) above and we repeat it here: 



sample we can insert the estimated parameter values in (3.2.1) 
to get the maximury value of the logarithm of the likelihood 
function. We calllthe resulting value H, . 

Under the null hypothesis of model fit we should expect 
essentially the same estimated value? of the item parameters 
whichever subgroup in the sample the estimates are based upon. 
In *the limit we can estimate the item paraneters whithin each 
of thQ k-1 score groups and still expect the same estirrates 
(within the limits of stochastic variation, of course). If we 
compute the value of the logarithm of the likelihood function 
for each of the score groups and call these (r=l,...,k-l) 
we can form the statistic: 



(3.2.1) 




After having estimated the item parameters for the total 



(3.2.2) 



\ 



k-1 

3ogX= H. - E H 
^ r=l 



55 



It can be shown that -21ogX is asymptotically chi-square dist- 
ributed when each n^^-^- with (k-l)(k-2) decrees of freedorr. 

This particular form of the test can, however, seldom be used. 
Onl^ rarely is the sample size so large that sufficently 
stable estimates can be obtained within each score group and 
When there are differences among the item difficulties the 
simple items tend to be answered correctly by all oersons in 
the higher score groups and the difficult items tend to be 
answered correctly by no person in the lower score groups, 
under which conditions it is not possible to estimate the 
parameters . 

However, Andersen (1973b) has shown that the test can be com- 
puted also when adjacent score groups are pooled. Thus, if we 
pool the k~l score groups into g disjoint groups we can esti- 
mate the parameters within each group, compute the Kj ( j =1, • , . ,g) 
and form the statistic: 



(?.2.3) 



logX= H. - Z 
^ j = l 



K 



Here too -21o'gX is asymptotically chi-square distributed v;hen 
nj — now with (E-l)(k-l) degrees of freedom. 

This test is available in the PNL program with an automatic 
grouping of the score groups. The grouping is carried out un- 
, der the constraints that there must be a minimum number (m) 
of examinees within each group (this number can be specified, 
with the default taken to be m=]00) and that there must be no 
zero or perfect iten scores within any group. 

\ 

The grouping process may fail either as a consequence of 
choice of too high a value of m or as a consequence of there 
being items answered correctly by all or no e .aminees in most 
of the score groups (or as a consequence of a combination of 
these two problems). The first problem can of course be easi- 
ly solved through the choice of a lower m but the second prob- 
lem can only be solved if those items causing the disturbance 
are excluded* 



ERIC 



56 



The amount ^of computer time required for computing the test 
depends upon three factors: the number of items in the test, 
the number of groups in which the parameters are estimated 
and the number of iterations require^i for convergence within 
each of the groups* There are two reasons for which it is * 
necessary to choice an m so large that the grouping results 
in only a few groups when k is large* The first reason is 
that it may be quite time consuming just to estimate the item 
parameters within the total group when the test consists of 
many items; if this is to be repeated for a large number of 
subgroups as well, the costs may become prohibitive. The Se- 
cond reason is that a large number of iterations is often 
required In groups composed of just the highest score group^. 

The reason for this is that the proportion of correct ans^^^rs- 

< ^» 

on the easiest item tends to be very hi^h in these groups in 
which case the convergence is slov/* 

Thus, 'before testing goodness of fit of a Ipng test it is 
strongly recommended that the ((n^^)) matric be inspected for 

a suitable choice of m. In fact, for very long tests it may 
even be impossible under a strict budget for computer time to 
apply this test for overall goodness of fit. It should be 
pointed out, however, that in the first steps of an item se- 
lection procedure with the purpose of constructing a undi- 
mensional test conforming to the Rasch model, the graphic 
tests give all the information needed. Only when the final 
test is to be composed of very many items may an overall test 
be required. In such a case, however, there is the possibility 
of constructing the test in parts and then testing whether 
the parts can be fitted together into one long test, using 
the procedure described in chapter 5.2 belov/. \ 

It is also possible to compute the conditional likelihood 
ratio test for the equality of iten parameters between .sub- 
groups defined in other ways than through differing raw 
scores. Each analysis with the PML program namely results in 
the value of the maximum of the logarithm of the likelihood 
function being printed, and these values can be used for 
simple hand calculations. Thus, if separate analyses are 
made within each disjoint subgroup (boys and girls, for 

ER?C 57 



example) SMid one analysis is made with all croups merged into 
one, all ingredients necessary for computing the test statis- 
tic (3.2.3) are at hand and only a few arithmetic operations 
are required. (For an example see chapter 4.2 below). 



The Martin-Lftf chi-square test 

Martin-Lttf (1973) has suggested an alternative test for asses- 
sing overall goodness of fit to the Rasch model in which a 
chi-square sum is built up from deviations between observed 
and predicted frequencies of corrects answers within each 
score group. 

Prom (3.1.'^) above follows that if the model holds true: 



(3.2.3) 



n 



(i) 

^ "r^i^r>l 



ir 



If we label the vector 



'"ir' 



=(qp) and call the corresponding 



(1) 

^^r^l^r-l 



vector of predicted frequencies 




(t^) the test statis- 



tic can be written: 



k-1 

(3.2.4) T= Z {(q^)-(t^)}M((V^))}'"^f(q^)-(t^)} 

r = l 



58 



which quadratic form ((V^)) is a variance- covariance 
matrix of order kxk with element.s defined as follows: 




9 

in the diagonal 



(3.2.5) 



(i,j) 
"r^i^jYr-2 



for 



Martin-L6f (1973) has shown that the test statistic is asymp- 
totically chi-square distributed with (k-l)(k-2) degrees of 
freedom when each nj,-*o>. 

In (3.2.4) the summation is made over all score groups. If, 
however, some n^=0 we have to restrict the summation to those 

R groups in which n^>0. The degrees of freedom then are 
(k-l)(R-l). 

This test requires computation of the second derivatives of 

the symmetric functions, ((y^1:2'^^). In the PML program this 

is effected with the Summation algorithm, through repeated 
calls to this routine with the parameter values for two items 
at a time put equal to zero. 

From ( 3.2.^4) it is seen that at any- sten in the computations 



are required. With the Summation algorithm, however, the' de- 
rivatives for all the score groups are obtained, which mak^s 



variance-covariance matrices for all the score groups and 
store these. Since the total number of off-diagonal elemeni-s 
in the variance-covariance matrices is given by the formula 

k(k-l)^/2 it is easily seen that a vast amount of storage 
space is needed when the number of items is large. For 
example, when k=60 8i6K bytes would be necessary to store 



only the ( (y 




)) of one order (i.e. for one score group) 



it necessary first to compute the off-diagonal values in the 



59 



these elements as REAL*8 numbers. 

For larger problems as sequential scratch-file is thus used 
to store the elements. Since it would be rather time consu- 
ming to read this file as many times as there are score 
groups, an array is used in which the information for several 
score groups is stored. The number of matrices which can be 
stored in this array depends upon how laree it is; it must, 
however, be dimensioned at least for k(k-l) elements and the 
larger it is the better. Both in this array and on the scratch 
file the second dei*ivatives .are stored as single precision 
numbers even though the precision used in the computations is 
dependent upon whether single- or double-precision arithmetic 
is chosen. 

When the number of items is large the Martin-LBf test tends 
to be quite time consuming to compute; not on'ly must the 
second derivatives be computed but the test requires inver- 
sion of (at wof^st) k-1 matrices of the" order kxk as well. 
For example, for k-60 and with all n^>0 the test requires 

about 7 minutes of CPU-time on the IBM 360/65. When the num- 
ber of items is moderately large, however, the amount of com- 
puter time required is no obstacle against using the test. 
For k=^0 somewhat more than a minute is required and when 
k=20 the test is computed in less than 20 seconds/ In most 
cases when the number of items is moderate this test is fas^ 
ter to compute than the Andersen test. 

The Martin-L5f test vs the Andersen test 

Both the overall numerical tests are asymptotically chi- 
square distributed (they are in fact related through a Taylor 
expansion), but there may be differences in the power charac- 
teristics of the tests and as well as in their asymptotic, pro- 
perties. It should also be noted that while the computation 
of the Andersen test may fail at times, especially when the 
sample is small, the computation of the Martin-L3f test al- 
most never fails. But even though the Martin-L3f test can al- 
most always be computed this does noli imply that the resu]ts 

60 



of the test can always be trusted; when a small sample is 
used and the number of items is large, quite a few score 
groups will necessarily consist of only a few persons .with' 
the consequence that the test statistic may be far- from chi- 
square distributed. In order to cast at least some light on 
the characteristics of the two overall goodness of fit tests 
some simulation studies have been performed, 

( 

To obtain some information about the difference in the be- 
havior of these tests for smaller sample sizes, data were ge- 
nerated so that they would conform to the model. For genera- 
ting the scores, a modified version of the routine presented 
by Allerup and Sorber (1977) was used, with a version of the 
the feedback shift register random number generator (Lewis & 
. Payne, 1973) as the basic generator^\ (It should be pointed 
out parenthetically that great demands are put on the basic 
random number generator in these simulations since the tests, 
and especially the graphic tests, are so sensitive as to be 
able to pinpoint generators with less than optimal qualities). 
Data were generated only for k=15 with the size of the item 
parameters chosen to vary in^equal steps between -2 and 2 
with the person parameters randomly sampled from a normal 
distribution with zero mean and unit* standard deviation. 

Data were generated for two sample sizes, n=150 and n=300, 
each with 50 replications. The number of observed p-values 
less than .05 (Mnc) and the means of the p-values (x^) for 

these analyses are presented in Table 3.1. 



1) I wish to thank Dr. Philip Ramsey at Hofstra University 
for putting into my hands an easy-to-use version of this 
excellent random number generator. 



^ 61 



Table 3>1 > Results^ from the two overall goodness of fit tests 
* for data generated to fit the models 



t 




Sample 


size 




150 


300 




^05 


\05 



rThe Mart in-L5f test 5 .57 1 ' 

The Andersen test 3 .^8 2 M 



With 50 replications we should not expect more than 2 or 3 
significancies at the 5 per cent level, and this is also what 
is found for the' Andersen test (in all replications two groups 
wer'e used in computing the Andersen test). But v/e also find 
that the Martin-L5f test discards the model at too high a rate 
for both the sample sizes. 

> " The reason for the difference between the tests is quite ob- 

vious when a look is taken at how they are computed. In the 
Martin-L8f test all score groups are treated regardless of 
their size (except when n^=0) while in the Andersen test small 

^core groups are pooled to form larger groups. In the present 
simulations there were of course score groups which contained 
only one or a few persons. 

In the presentation of the results from the Martin-Lttf test in 
the PML prograrr all the indepeijdent contributions to the ohi- 
square sum from each score group arp, however, printed out, 
and it was noted that in all the cases when this test resul- 
ted in a highly significant chi-square sum a very large part 
was contributed by one or two score groups consisting of only 
a few persons. It is thus strongly recommended that when this 
test is applied in situations where the sample is small rA]a- 
tive to the number of items, the contributions frorr the rmall 
score groups are investigated, and that the results of this 
test are put aside as soon as therie is a large contribution 
from any score group consisting of less than, jay,. 10 persons. 



ERLC 



62 



In investigating the power of the tests. sets of data were ge- 
nerated under the two-parameter ipodel, with varying values of 
the discrimination parameter for the items. As previosly, on-- 
ly the case with k=15 was considered, with the item parame- . 
ters taken to be three each with the values -2,-1,0,1 and 2 
and the person parameters chosen in the same way as above. 
Data were generated to reflect three degrees of deviation 
from the one-parameter: small, with one third of the parame- 
ters 0.9i one third 1.0 and one third l.lj moder^ate, with the 
discrimination parameters chosen to be 0.7, 1.0 and 1.3; and 
finally large, with the corresponding discrimination* parame- 
ters chosen as 0.5, 1.0 and 1.5. In all cases the three 
discrimination parameters were represented fit all the five 
levels of item difficulty. 

Three different sample sizes were used; I50, 300 and 1 000 
and 10 replications were made. The results are presented in 
Table 3.2. 

Table 3.2 . Results from the two overall tests for data gene- 
rated to deviate from the * \ 



Amount of deviation 
Small r^oderate Large 



ro5 



n = 150 

The Martin-L5f test 2 .kl 
The Andersen test 1 .U2 



05 



1) 



36 
27 



.05 > 



,2) 



.15 

.01 



n=300 

The Martin-LSf test 0 .66 
The Andersen test 0 .^1 



6 



,20 
, 10 



7 
10 



.07 

.no 



n=l 000. 

The Martin-L5f test 0 .50 9 .0] 10 .cn 

The Andersen test 1 .Ik ]0 .00 10 .00 



ERJC . 



1) The Andersen test could be computed in only 9 cases, 

2) The Andersen test could be computed in only 8 cases, 



63 



We find that when there are only small deviations .from the • 
one-'parameter model there is no possibility with the sample 
sizes used here to detect any deviation from the model (it 
will be shown below that even though highly significant va- 
lues of the test statistics are obtained when the sample size 
is heavily increased there would still be reason to accept 
the model with this amount of deviation in the data). 

With large deviations from the model we find that the Ander- 
sen test in all successful analyses , for all the sample sizes, 
discards the model, while the Martin-I.6f test discards^ the 
model only for the sample size 1 000 in all analyses. At 
least for deviations from the model caused by varying 
discrimination among the items the power of the Andersen 
test thus appears to be greater than the, power of bho ''lartin- 
L5f test. 

For the intermediate case with medium deviations v;e do find 
indications, too, that the Andersen test is more powerful 
than the Martin-L6f test but it can also be noted that orly 
for the largest sample does the former test consistently dis- 
card the model. 

Even though these simulations are merely some examples it does 
seem as if the conclusion can be drawn that the likelihood 
ratio test has somewhat better properties than the chi-squaro 
test both with respect to the number of observations needed 
to claim that the test has the assumed distribution and with 
respect to power. 

3»3 Redundancy 

No model is ever completely true in describing a set of data, 
which means that with a sufficient number of observations 
any goodness of fit test would dificard the model. In disour>- 
sing this problem Martin-Lof (197^^a) stated: 

"This indicates that for laree sets of data it is too 
destructive to l^t an ordinary sienificance test decide 



ERIC 



64 



whether or not to accept a proposed statistical model, 
because, with few exceptions*, we know that we shall have 
to reject it even without looking at the data simply be- 
cause the number of observations is so large. In such 
cases we need instead a quantitative measure of the size 
of the discrepancy between the statistical model and the 
observed set of data... (p. 3). 



Martin-L5f derived such a measure called redundancy (R) fron 
concepts in the statistical information theory, which on an 
absolute scale measures the deviation between a statistical 
model and a set of data. The redundancy exists in two forms: 
the micro-canonical redundancy corresponding to non-parametric 
formulations of the test and the canonical redundancy corre- 
sponding to parametric formulations. The canonical redundancy, 
which is of course the only one that is accesible in tests of 
the Rasch model, should be regarded as an approximation to the 
microcanonical redundancy and both can be given the same inter 
pretation : 

*'it is the relative decrease in the number of binary 
units needed to specify the given set of data when we 
take into account the regularities that we detect by 
means of the exact test*' ( F^artin-Lof , 197^, p. 10). 

Since tne measure reflects a relative decrease it assumes va- 
lues between 0 and 1 and low values indicate a good fit ' 
between the model and the data. 

The canonical redundancy can easily be computed from the 
likelihood ratio quotients (3.2.2) or (3.2.3) above together 
with the iTiaximum of the logarithm of the likelihood function 
(3.2.1) : 

(3.3.1) H= 



It is also possible to compute R from the r'artin-LOf chi-square 
test, which gives an approximation for R in the formula above: 



(3.3.2) R= 

2«t 



Since the scale upon which H is expressed is in a sense abso- 
lut.e it is possible to use case studies for calibrating it. 
Martin-L6f ( 197^a) computed the values of R for different va- 
lues of the binomial probability p with respect to the hypo- 
thesis p=.5, with the results presented below: 



p R Pit 

.000 1.000 1. Worst possible 

•216 .68^ .1 Very bad 

.^^1 .559 .01 , Bad 

.^S2 .518 .001 Good 

.^9^ .506 .0001 Very good 

It might be of some interest to compare this calibration of 
the redundancy scale with the results which ran be observed 
for R when very large sets of data with known deviations 
from the model are generated. Data have thus been generated 
under the two-parameter model with different values of the 
discrimination parameter for the items. In all analyses 15 
items were used with the same 5 levels of difficulty para- 
meters as in the simulations investigating power presented 
above. The sample size was 50 000 persons (N(0,1)) and three 
different discrimination parameters all represented at all 
levels of difficulty were .used. 



Discrimination 

Case parameters 

1 1.00 1.00 1.00 

2 .95 1*00 1.05 

3 .90 1.00 1.10 
H .85 1.00 1.15 
5 .80 1.00 1.20 

ERLC , 



The Andersen test 





df 


H 


185.2 


182 


.0003 


2m. s 


168 


.0005 


155.^ 


15^ 


.0008 


933.2 


]82 


.0017 


150H.7 


182 


.0028 



66 



The Martin-LQf test 





df 


R 


188.0 


182 


.0003 


25'<.2 


182 


. 0005 




182 


.0009 


933.6 


182 


.0017 


1^498.7 


182 


. 0028 



In case 1, using data fitting the model, we find hon-signifi- 
cant values of the test statistics and the redundancy indica- 
tes a "very good'* fit. In all the other cases the statistical 
tests are very highly significant but at least for some of 
thein the value of R is low enough to indicate an acceptable 
fit. 

For case 2 'the value of R is .0005, which on the scale 
es^tablished by Martin-LSf corresponds to a fit tha.t is "good" 
to "very good". The graphic and binomial tests of the items 
, in this analysis showed no signs of systematic deviations 
from the model and would thus have been useless to improve 
the fit, (Had a plotting method yielding greater accuracy 
been used, such as the one in the Allerup & Sorber, 1977, 
program it might of course have been possible). 

Per case 3 the valus of R indicate a "good" fit. Here, how-/ 
ever, the graphic tests could be used to identify all the/ 
• deviating items. Here there is thus a choice of whethrer^ to 
improve the fit through selecting items, or to accept the 
fit as satisfactory. 

r 

The other cases all ^how a fit which is worse than "good" 
and in all these analyses both the graphic and the binomial 
tests could clearly be used to identify the deviating items. 

The results obtained in case 2 show that it is possible to * 
observe a highly significant deviation from the model v;ith an 
inferential test while at the same time.it is impossible to find 
any deviations with descriptive methods. If in such a case the 
redundancy is suffiently low, less than .001 say, we have a 
good basis for accepting the model in spite of the significant 
test statistic. 

If the redundancy is low and it is possible to use the results 
from the graphic tests to improve the fit we have the cho5 
of doing so or to accept the model as showing a f^ood fit to 
the data. In making this decision it does seem necessary to 
invoke other than statistical criteria, such as content^ rela- 
ted considerations. 

r 

67 



In order to prevent any misunderstanding to occur it should 
finally be pointed out that the redundancy statistic is of 
any interest only when the number of observations is large; 
high redundancy observed for a smaller sample is not necessa 
rily a sign of a poor fit. 



ERIC 



68 



Chapter h 
CONSTRUCTING RASCH SCALES 



It has repeatedly been stressed that the Rasch model is the 
LT model which entails the strongest assumptions, and even 
though no model is ever wholly valid for describing a set of 
data, serious deviations from the assumptions will invalidate 
most' attempts to capitalize on the great potentialities for 
applications in the model. Thus, whatever eventual applica- 
tion is intended, one inevitable first step is to make sure 
that the data do show a reasonable fit to the model, and if 
they don^'t, take the necessary precautions to make sure that 
they do. 

-In the introduction it was mentioned that the Rasch model has 
already been applied to some extent and surely" some experience 
has accumulated as to possible sources of threats to the model. 

.But it must also be stressed that in the applications carried 
out on the European continent as well as .in North America the 
problems of testing goodness of fit have been taken rather 
lightly, wh^ch is almost surely a consequence • of the fact 
that the procedures employed for , testing goodness of fit have 
less .than optimal properties. In faot, there are very few 

* studies where the test procedures developed on the basis of 
the conditional approach have been used for other than 
illustrative purposes. 

In this context neither will it be possible to present much 
more than illustrations of applications but the important 
point to note is that there is still much research to be 
carried out on the sources of dpviations from th^ model arid 
hov to remedy them. 

Before the possible sources of deviations from the model are 
discussed, analyses of two tests of PMA-type (Primary Mental 
Abilities), develop within the framework of classical test 
theory will be presented. 



69 



k.l Analyses of two tests of PMA-type 



The two tests to be analyzed' are Number Series and Opposites 
constructed to measure inductive (or non-verbal reasoning) and 
verbal ability respectively. The tests were constructed by 
Svensson (196^, 1971) and the only reason for chosing these 
^tests was simple access to data which consist of a sample of 
566 fifth-graders (see Gustafssor, 1976, for a detailed 
account of why and how the data were collected). 

Each of the items in Number Series consist of a series of 
six numbers and the task is to add the two following numbers. 
The time limit of the test is l8 minutes. 

In Op^osites, which test also consists of ^0 items, the task 
is to select^ from among four given words the one which is the 
opposite of a given word. This test too is timed, with the 
limit being 10 minutes 

♦ 

OppQ.sites is thus a multiple-choice test which allows guessing 
and can for this reason alone be supposed to show a poor fit 
to the model. But it is of course of some interest to inves- 
tigate in what ways this kind of violation of model assump- 
tions expresses itself in the r.odel tests. Number Series, in 
contrast, requires constructed, responses which means at least 
that guessing is minimized as a source of deviation from the 
model. Rasch (I960) who also investigated the fit of some 
previously existing tests to the model in fact found, with 
graphic methods, a good fit >for a test highly similar to 
Number Series. 

Number Series 

With a sample of 566 persons and a test with ^^0 items it is 
obvious that quite a few score groups will be very small; an 
attempt was thus made to use the Andersen test to investigate 
the overall fit of the test. This test oould, howsver, not hf- 
computed for the original set of items since easy items were 
solved by all persons in almost all the score groups, excep- 
ting only some of the lowest while two of the most difficult 



items were solved only by a few persons in some of the highest 
score groups. When the five easiest and the two most difficult 
items were excluded, however, the score groups could success- 
fully xbe grouped into four groups, with the value of the test 
statista>q being 3^9.2 with 96 degrees of freedom, which is of 
course higl^y significant. 

Thus, if we hade hoped to find a good fit for the Number Se- 
ries test to the Rasch model, there is reason for disappoint- 
ment. But on the other hand it will be instructive to find out 
^he reasons for the poor fit of this test. 

Several factors may, singly or in coinbination, be responsible 
for the poor fit: item heterogeneity, speededness of the test, 
learning effects from one item to another, varying item 
discriminations, just ,to mention a few. In searching for the 
' cause or causes to the deviations, the information which is of 
most help is the graphic test of each item, along with, of 
course, the content, of each item and every piece of informa- 
tion about the testing situation v;hich can be found. 

The items in the test have been analyzed and the recursive 
formulas defining the series have been determined. These 
algorithms are presented in Table ^.1 along with the propor- 
tions of correct answers and rough summaries of the graphic 
tests in which for the lower and the hipher score groups 
+ and - signs have been used to indicate whether the observed 
proportion of correct answers is higher or lower than the 
predicted proportion* 

J 



< V 

71 



Table ^.1 . The recursive formula defining the items in the 
Number Series test* 



Item Prop. Algorithm "^^h 

score score 

corr , 1^ groups groups 



2 


.97 




a^=l, 


a2 = l 


3 


.98 




a, = 9 




it 


.98 




a^=l, 


^2 = " 


5 


.97 




a^=l8 




6 


.93 


a +2 

n 


a,= 3 




7 

( 


Q 1 


a - X 

n 


Q - 5 1j 
a ^ - £: M 




8 


. 87 


a -i4 
n 


a^=29 




9 

10 


:s5 

.77 


a +7 
n 

n 


a, = 10 
i 

a, =51 
1 




11 


.63 


'a T +2 
n" 1 


a.,=2, 
i 


n=l,3, 






a 

n-1 

I 


a-=2 


M=2,4, 


12 


.56 


a .2 

n 


a, = 2 




13 


.67' 


a , +3 
n" 1 


a^ = 7, 


a2 = 8" 




.57 


n-l 


a^ = 5, 


32 = 7 


15 


.6i< 




a^=n, 


a2 = 8 


16 


.5^ 


V2 - 


a, = 5 




17 


.50 


a^.n-1 


a^ = 2 




18 


.51 




a^=22, 


a2 = 21 


19 


. 


a^.n+2 


a,= 3 




20 




a ,-5 
n-1 


a^=19, 


a2 = l7 


21 


.'^3 


a .2 

n 


a,= 3 




22 




a .-1 
n-1. 


a^=]2, 


a2,= 13 


23 


. 41 


a^-lO+n 


a^ = '43 




2** 


.'^5 


a , +9 
n-1 


a, = 5. 


a^ ~ 1 1 



72 



Table 4.1 Continued 



Item Prop, 
corn. 


Algorithm 
n+ 1 


Low 

score 

groups 


High 

score 

gr*oups 




28 


a„+2(n-l) 
n 


a, =5 + 

1 




26 




» , -2 


a =^4 a =2Q 




27 


. 31 


a , + 3 
n-1 ^ 


a =17. a =1*^ 
* > ^2 




28 


. 40 / 


a , '2 
n-1 


a, =6 * ^ = ] 2 




29 


.-35 


n-1 


a^ = 128,a2 = 64 


+ 


30 


.25 


a T-5 
n-2 


a^=20, a2=l8,a^=l6 - 


+ 


31 


.28 / 


^n-l*2, 


a =1, a =4 


+ 


32 


.29 


^n-2*5 


a^=l,. a2 = 3, a^ = 5 




33 


.32 


'-n-l*l 


a^=i, n=l,3,5 
a2=2, n=2,4,6 




31 


.20 


[a +1 
n 

^n-l*l 
9 


n=2,5,8... a^=l 
n=4,7,10... + 
n=3>6,9. . . 




35 


..13 


a » + l6 
n-2 


a, =13, a-, = 15 ,a-, = 22 




36 


. 17 




a^=l, a2=2, a^=3 


+ 


37 


.09 


a +1 
n 


a^=3 n=^,7>10. • , 
n=2,5>8. . . 








a «+a 
n-1 n 

^ \ 


n = 3>6,9- . . 


+ 



Looking at the pattern of deviations from the nodel as evi- 
denced by the g.-^aphic tests we find that for most of the 
items late in the .test the observed proportion is too high 
for the higher score groups and too low for the lower score 

ERLC 73 



"groupsrif the items "appearing" late in the test have a higher 
discrimination parameter such a pattern of results would be 
found, but there are other explanations as well of which 
speededness of the test appears to be most reasonable. For 
the items with order - umbers around 30 almost half the sample 
did in fact not attempt any answer, correct or incorrect, 
which is a strong indication that a large proportion of the 
sample did not even attempt to solve the items appearing la- 
ter in the test. Additional evidence in favor of this inter- 
pretation is obtained from the algorithms for the items. The 
recursive formula for items 2? and 31 are in fact essentially 
the same as those for items 13 to 15, for example, and still 
the items appearing early have proportions of correct answers 
which are almost twice as large as those' for the items appea- 
ring later in the test. This must be regarded as a very 
strong indication that the test is speeded 4n the sense that, 
if given additional time, some persons would get additional 
items correct (or for that matter, that there may be some 
other reason, such as boredom, accounting for why some of 
the examinees did not atteir.pt the items later in the test). 

If speededness or some other factor with equivalent effects, 
is the only reason for the poor fit of the whole test, we 
should expect a good fit for items placed early in the test. 
Since omitted responses were coded in a special way it has 
been possible to determine the proportion of omitted respon- 
ses for each item and this proportion was found to be fairly 
low, never exceeding 2055, for item 22 and earlier items, 
while there was a rather rapid increase in the proportion of 
omitted responses for the items from number 23 to the end of • 
the test, 

A new analysis was thus performed including only items 2-22. 

This analysis too resulted in a highly significant x^"value 

of 57»6 with 20 degrees of freedom (the Andersen test. with 

the score groups grouped into two groups). Again the grapv^io 

tests of tne items were resorted to and these indicated a 

t 

poor fit for Items 9, 10 and 11, w?th the fit being worst 
for item 11. For the higher score groups there ..was for this 
item a too low observed proportion of correct answers and for 



ERIC 



74 



the lower score groups the observed proportion was to high. 5^ 
Just a glance at the recursive formula for this item (see 
) , Table 4.1) is sufficient to show that it deviates from those 

for the other items early in the test in that it defines two 
intertvined series defined by different rules Ob viously this 
item measures at least partly an ability which is different 
from the ability measured by the other items in the early 
part of the test . 



The graphic tests for items 9 and 10 gave a pattern very much 
like that found for item 11, but less pronounced. The alpo- 
rithms for these two items are the same as for those four items 
immediately preceeding them. What obviously makes items 9 and 
10 more difficult and also showing a poor fit is that they 
pose requirements for arithmetical abi],ity: they require com- 
putation of expressions like ^5-38, which is a task which 
pupils in the fifth grade have a high probability of failing 
(Kilborn & Johanson, 1976. It can parenthetically be men- 
tioned that when the second author above was asked to identi- 
fy those items in the early part of the test posing exceptio- 
nal demands for arithmetic skill, items 9 and 10 were clearly 
identified and a few more with some doubt). Thus we can drav; 
the conclusion that the reason why items 9 and 10 do not fit 
together with the other items is mult idimensionality of the 
latent space, i.e. performance on these items is affected by 
arithmetical skill in addition to the ability measured by the 
other items. 



er|c 



A new analysis was ^performed in which these three items were 

p 

excluded with the result that the Andersen test gave x =28.4 
with 17 degrees of freedom, with a corresponding p-value of 
.04, which will here quite arbitrarily be regarded an accep- 
table fit. 

In passing it can be mentioned that th^ Martin-L5f test Tor 
the same items resulted in a very highly significant valu^ 
f of the test statistic (x^=763.3, df=272). A very large part 
of the X -sum (457.6) was, however, contributed by score 
gr^p 2, consisting of one single examinee who had answered 
V.^tems 15 and 20 correctly. The results from this test must 



75 



thus obviously be set aside (cf page 55 above). 



Even though the overall test indicates that an acceptable fit 
was finally obtained, the graphic tests could be used to se- 
lect a still more homogenous item set or perhaps severaA^item 
sets. There were, for example, some indications that asceWlfig 
and descending series gave slightly different results. How^^r 
since we in this case are restricted to a very limited setNpf 
items there is but little to be gained from pursuing such ana- 
lyses. 

In conclusion, we have thus learned that unless a reasonable 
number of examinees have attempted the items and unless in- 
fluence from other abilities is not controlled for, the data 
will not fit the model. But it should also be pointed out that 
we have made a heavy selection among the items and have thus 
to some degree capitalized on chance effects. Thus, the fit 
of a set of items selected from a larger pool on the basis of 
the results in one sample should be tested in another sample, 
for purposes of cross validation . 



Opposites 

Analysis of the items in Opposites with the Andersen test re- 

p 

suited in x =333.^ with 117 df, which is of course highly 
significant . 

Table ^.2 presents gross summaries of the graphic tests of 
the items (the first two items have been excluded since they 
were answered correctly by almost all persons; thus little 
information is gained by keeping them, but as soon they are 
included in an analysis a larg? number of iterations is re- 
quired for convergence). As before the method of marking too 
high and too low observed proportions of correct answers for 
lower and higher score groups with ^ and - signs has been 
used. When looking at the pattern of signs it should, however, 
be kept in mind that they represent a very simplified descrip- 
tion, tnere sometimes being important dTffererjres between tm^s 
plots for items with the same pattern of signs. 

76 




-T«5Te~«r;-2-. Smrnary-xst trh^igraplri-s- tests--of ttiB--i-tems- iir - 
Opposites. 





rrop • 


Score 


groups 


Item Prop. 


Score 


groups 






LOW 


high 






low 


hi^gh 


3 


.98 






22 


.60 


- 


+ 


4 


.97 






23 


.51 


- 


+ 


5 


.89 




+ 


2H 


.37 


+ 


•* 


6 


.90 


— 


+ 


25 


.Hi* 


+ 


- 


7 


.71 






26 


.31 


+ 


- 


8 


.71 


— 


+ 


27 


.32 


+ 


- 


9 


.75 


- 


+ 


28 


.^19 






10 


.80 


- 




29 


.39 






11 


.58 






30 


.39 


+ 


- 


12 


.69 


- 


+ 


31 


.16 


+ 


- 


13 


.72 


- 


+ 


32 


.33 






lit 


.69 




+ 


33 


.25 


+ 




15 


.66 






31 


.27 


+ 


- 


16 


.56 






35 


.22 


- 


+ 


17 


.60 






36 


.29 


+ 




18 


.n 




+ 


37 


.22 




+ 


19 


.69 


+ 




38 


.11 




+ 


20 


.53 




+ 


39 


.25 


+ 




21 


.38 






HO 


.22 


+ 





Nevertheless the deviatior^s form quite a clear pattern: for 
the items late in the test\^hich are also the more difficult 
ones, there tends to be a too^igh observed proportion of cor- 
rect answers for the lower score groups and a too low proper-^ 
tion for the higher score groups, while the reverse pattern 
of deviations ts found for the easier items. This is exactly 
the pattern to be expected, when a test permits' guessing: on 
the difficult items the examinees with low ability will get 
scores which are too high by guessing, with the consequence 
that their ability is overest^imated which in turn implies that 
on the easier items where the proportion of the sample which 
guesses is smaller, the low ability examinees wil] appear to 
perform too poorly. « 

77 



What is perhaps more interestinr than this general pattern is 
that there, nevertheless, are items whioh, do not conform to 
it: Some of the most difficult items do not appear to be af- 
fected by random guessing and there are in fact a few items 
(35,37 and 38) with a very low proportion correct (lower ac- 
tually than would be expected if all the examinees guessed 
randomly) on which the - + rather than the + - nattern is ob- 
served. It may be intei'ltebing to take a closer look at these 
items which are "good" items in the sense that if it were 
possible to estimate the discrimination parameters, this 
parameter would be found to be high for these items. 

The three items are presented in Table 4.3 together with the 
percentage of subjects marking earh alternative. 

■Table 4.3. Three difficult, highly discrininnt ing items in 
Opposites. 



* 








Choices 






li Item 


Stem 


1 


2 


3 


4 Mo 














resp 




35 > 




C«8ii«mt(22) 


iitmUT«(33) , 










Slt»ifioaBt 


iraelMr(r4) 


naiaporUBt(22) 


Dtipii«4(l3) 


Nmittglttt(4l) (10) 








r#4}r(63) 


XBpoT«rlih«a(9) 


1la«alfio«ai(8) 


Sc«Bty(ll) (9) 



What is especially striking, particularly for items 37 and 
38^ is the high percentage of examinees marking one of the 
dist/actors. If the content of the iterqs is looked at, it 
does become envious, however, why one of the incorrect alter- 
natives is so attractive. In item 37 the majority of the 
examinees have chosen "meaningless" the opposite or "sig- 
nificant". I suspect that even in English this distractor 
would be quite attractive, but it must he so to an even 
higher degree in Swedii5h since the .''wodish counterpart of 
significant can f>e literally translated as "meaningful". 
Obviously, many of the examinees, not knowing the exact 
meaning of the words, were fooleH by their literal appearan- 
ces to choose this particular distractor. 

78 




Jhe same explanation holds true for item 38, even though this 
(is here. .less clear from the translation into Enclish. The li- 
teral translation fron Swedish into English of ample is, how- 
ever, "richlike" which makes it understandable why more than 
60 per cent of the sample chose '^poor*' as the opposite to 
ample. 

These examples provide an explanation of why some multiple 
choice items don^t show evidence of any guessing effects: 
if One (or more) of the distractors is so attractive that al- 
most all of those v;ho don^t know the correct answer chose it, 
of course little ..or no random guessing will take place (cf. 
Lord, 1974a and page 8 above). From this, it follows that it 
is at-: least in principle possible to construct multiple choic 
tests where guessing will only to a small degree be another 
factor affecting performance. Whether it is possible to con- 
struct ;i'uch a multiple-choice test in pr'actice is of course 
more doubtful, and is probably not worth the attempt. 

We will now embark on an excersise intended to serve above 
all as a warning: The usual practice in item screening to 
obtain fit to the model is to try out a larger set of items 
on a sample and select those that appear to fit the model. 
We will investigate whether this is possible here, in which 
case we know that such a procedure can yield only essentially 
meaningless results as a consequence of the Tart that all the 
items are of multiple choice type and thus are influenced by 
guessing. 

There are essentially three types of items to be found in 
Table 4.2: those with no signs marked, those with the + - 
pattern and those with the - + pattern, corresponding to i- 
tems with intermediate, low and high discrimination, respec- 
tively (the items will be referred to as MP, T,D and HP itemr,) 
It r W be argued that those items without any signs marked 
are se that t'hould be selected since they do not show -^ry 
deviation from the model. Not much thought is required, how- 
ever, to detect that this is incorrect: items do not show 
fit or lack of fit to the model in themsf:lves, the model 



79 



assumption instead says that the items should be homogeneous, 
i.e. that each item should fit together with the other items. 

This implies that if we analyze the three groups of items se- 
parately, we should eTf^ecf to find ^hree sets of items which 
each form a scale conforming to the requirements of the Rasch 
model. 

Such analyses have been performed using every item listed in 
Table ^.2 except item 15 since the results of the graphic test 
for this item did not conform to the results of any other 
item.- The results from the goodness of fit tests (the Andersen 
test]|^ are presented below: 



Type of i-cem 


Member of items 




df 


p 


ISS 


LD 


12 


33.0 


53 




.27 


MD 


10 


19.8 


27 






HD 


15 


'49.6 


it2 


.20 


.72 



We thus find that the reasoning was correct; for each set of 
items a good fit is found. It ca*-^ also be observed that r.he 
ISS (see page 41) is considerably higher for the HD than for 
the LD\ items . 

Two conlclusions can be drawn from this exercise. First, the 
question of item fit is wrongly stated if it is asked 
whether an item does or does not fit the model, the correct 
question to ask is whether any given item fits together with 
the other items. This implies in turn that in most cases ana- 
lysis of an item pool should not result in the selection of a 
subset of items which are "good" in relation to the require- 
ments of the model, instead grouping of items into internally 
homogenous scales is th^ result to be sought, and throughout, 
of course, attempts should be mad^^ to clarify what r^:\r\) sralr^ 
is measuring. 

The second conclusion to he drawn i r: nur'^^ly n^fativ^'; "H viouo- 
ly it is very easy to select items fron a pool so as to Torn 
scales conforming to th^} model, hut in this casr^ it in alnont 
equally obvious that the result is nonsensical, sinco if thos'* 



SO 



groups of items were administered to a new sfempleNwith only a 
slightly different distribution of person parameters a poor 
fit would be found. What has been done can probably best be 
described as a capitalization on tradinc relationships betv;een 
assumptions; for example the amount of Guensine is different 
on tne items and the discrimination can be supposed to vary. 
These two factors can blend and balance in different ways for 
different items, with the net result being that items which 
are very different in both these r*^espects can be found to fit 
together, 

4. 2 Item bias in Opposites ^ 

The overall numerical tests as well as the praphic tests are 
constructed from the starting point that the item parameters 
should remain the same for all levels of score groups and e~ 
vidently these tpsts are powerful means of guarding against 
violation of certain kinds of model assumptions 3uch as vary- 
ing item discrimination. The Rasch model, however, states that 
the item parameters shall be the same whichever subdivision of 
a sample is made, and the tests based on the results for 
groups with different levels of performance nee^i not be power- 
ful when some itemf; are too easy for one subgroup and too 
difficult for another if the overall level of performance of 
the groups is equivalent. 

This problem of analysis, of what hars been termed item bian 
will be illustrated through analyses of sex differences in 
Opposites. 

As was mentioned above on page 53 the Anc^ersen test ( equation 
3.2.3) can be used to test differences between the estimates 
of item parameters obtained in any disjoint grouping of the 
sample through performing some simple hand calculations of 
figures found in the computer printout, i.e. the maxiriur; of 
the log likelihood. 

The item parameters v/ere first estimated separately for boy:^ 
and girls for items y^O in Opposites with the resulting va- 



81 



lues of H. being 905.31 and -4 8'5^.lS respective 1:' , Those 
values, together with the value of of 795.^0 found with 
both groups pooled were put into formula (3.2.3) with a resu3 
ting x^=lll*36 with 37 cif, which is highly significant. 

Had there been large differences in the level *of performance 
of boys and girls it could have been argued that the signifi- 
cance does no^ reflect anything except th^^ kind of deviations 
already detected with the overall goodness of fit test. Since 
this is not the case (even though the mean for boys is slight 
ly higher) we can go on to study which items tend to favor 
boys and girls respectively. 

Since estimates of the normally distributed standard errors 
are obtained along with the estimates of the item parameters 
in each analysis a z-test for the difference between the pa- 
rameters for each it»er can easily be computed (Fischer, 197^i 
p. 297): 



(4^2.1) - ^ir^i2 (i = l,...,k) 



where the subscripts 1 and 2 refer t(DN:he groups. 

The item parameters, along with the results fron the statis- 
tical test, are presented in Table A negative sir^n of z 
indicates a low r value of the item parameter for boys, i,e, 
that the item is easier for boys. There are four iterr.s f6r 
which a significant difference in favor of boys is found an^^ 
for four items a significant difference is found in favor of 
girls. 

The words which are "too easy" for boys are **srurl'', ''attack'" 
"noble'^ and "foolhardy'' and the wordr. which are "tco easy" 
girls are "smooth", "desert", "merry" and "anonymous", '^his I 
not the place to venture in^ discussion why certain iterrs ar 
biased in a certain way, but at least for th*^ "boys it^ms" it- 
does appear as if they are related to what is regarded as 
boys^ activities (cf. V/ernersson, 1977)* 



82 



Table Tests of equality of estinated item parameter? for 







boys and girls for 


-item 


3-'<0 in 


Onpos ites 


• 




Item parameters 




Item 


parameters 




Item 


Boys 


Girls 


z 


Item 


Eoys 


Girls 


z 


3 




-'1.07 


-.36 


22 t 


-.16 


-.58 


2:23^ 


M 


-3. 16" 


-3.78 


1. 36 


23 


.03 


.07 


-.2n 


5 


-2.26 


-2.22 


-.16 


2^ 


. 39 


1.03 


3. no'' 


6 


-2.U 


-2.50 


1.25 


25 


.n 


.?6 


1.15 


7 


-1.05 


-.75 - 


1 .56 


26. 


.9^ 


1.00 


-.29 


8 


-1.3'< 


-.52 - 


it .03'^ 


27 


.92 


.90 


. 10 


9 


-1.01 


-1.26 


1.21 


28 


1.60 


] .69 


-.ni 


10 


-1.27 


-1.^2 


1.62-- 


29 


.73 




1.59 


11 


-.20 


-.33 


.72 


30 


.63 


.57 


. 33 


12 


-.72 


^-.93 




31 


2. on 


1 . 82 


.92 


13 


-.91 


-1.00 




32 


.98 


.7'4 


1 .22 


111 


-.60 


-1 .06 




33 


1 .20 


l.'i5 


1.21 


15 


-.iJ8 


-.87 


2.07^ 


31^ 


1.10 


i.3n - 


1 . 16 


16 


-.21 






35 


1.18 


1.85 


3.07^^ 


17 


-.51 


- .20 - 


T .70 


36 


1.5 3" 


1.91 


1.6fi 


'l8 


- .^9 




2.91^ 


bJ 


l.n'i 


1 .59 


-.68 


19 


-1.27 


■ -.iJil - 


'1.15"'^ 


38 

1 


2 . ^9 


2.22 


.96 


20 


-.02 




. 11 


39 


1.32 ' 


1.36 


-.2-0 


21 


.75 


.51 


1.32 


i40 


I. no 


1.59 


-.88 



It will be recalled that in the analyses performed on Oppo- 
sites in the previous sect 'on, it was possible to divide the 
items into three CJ^oups, within each of v;hich a 13000 fit to 
the model was observed. It can be asked how this groupinr of 
the items is rel?^ted to sex bias. >'\ccord3nc to the sie:ns of 
the z-tost presented in Table k J\ each item in the three 
groups was classifi^^d according: to whether it tended to \r 
biased in favor of boys or girls. Th^.* results ar<^ present^^' 
below ; 



83' 



Tendency towards bias 
in favor of 

Item type Boys Girls Total 

LD 8 U 12 ^ 

MD 5 5 10 

HD 5 in 15 

1. ^ 19 37 

There is ;i correlation: the iter.s v/bich were identified as 
having a low discrimination tend to be biased against 
while the IID items tend to be biased in favor of cirls. i-ut, 
on the other hand a closer scrutiny of Tabbies ^,2 and ^ . 
reveals '.that two of tViS items significantly favouring boys 
are of the LD type, v;hile the other two are of the HP type. 
There is thus a considerable het erogeneitV within the groups 
of items with respect to sex bias, which in turn imnlies thai 
the three scales previously found to fit the model nay well 
have to be discarded on the basis of an analysis of sex dif- 
ferences • 



Three separate analyses have thus been performed in whxch 
the Andersen test war, userl to test sex differcncer, for each 
of the three scales: 









df 


P 


LD 


28. 


1 


11 


n< .01 


MD 


, U. 


0 


9 


ns 


iiD 


57. 


7 




n< . oni 



For two of the scales there are significant differences 
between boys and girls with respect to the item parameters 
in spile of t\ie fact that the overall goodness of fit testfs 
did not indicate any reason for discarding the model. Thus, ^ 
ven though one test showr a good fit; another can show a poor 
fit, which is of course due to the fact that the tests h?»v^ 
differential power of detecting different devintionr> fror 
the model. 

* It is of some interest to study the distribution of person 



84 



parameters for boys and girls on the three sub-scales. Martin- 
Lttf (1973) has presented two tests for comparison of the 
distributions of the person parameters for two croups. One of 
the tests is a likelihood ratio test and the other is a x 
sum. If we use the subscript e (e=l,2) to denote the groups 
we can write the likelihood ratio test: 

k 

2 k n 

(i4.2.2) logX= ^ Z E n^^ loE -^4 I % f 

e=l r=0 n r=0 " 



Since the index r here varies from 0 to k the test has k 
degrees of freedom, and as before -21oeX is asymptotically 
chi-square distributed as u^-*^. 

2 

The X -test is computed according to trhe following formula 



(^.2.3) 



r = 0 "r "l 



This test, too, has k degrees of freedom and is asymptotically 
chi-square distributed under the same condition as the likeli- 
hood ratio test . 

Application of these tests require that ther^? be no differenr^^ 
between the groups among the item parnmetf^r?; so in .this case 
they can be strictly used only for the '^P items. Nevertheless, 
the tests were used on all three scales and the results are 
presented below: 





I ike 1 '.hood 






"eans 


of raw scores 


Item type 


x^atio test 


X -test 


df 


boys 


Girls 


LD 




53.2'' 


12 


ii.ie 


3.55 


MD 


12.0 


11.9 


10 


5. '77 


5.61 


HD 


1'4.5 


8.2 . 


'15 




8.76 



The two tests give highly similar results except for tne HT) 
items and they both agree that only for the LV items is . 
there a significant difference among the distrib^ions of 



85 



item parameters. On this scale'.the boys have'a higher mean 
than the girls, which is also the' case for th<=» other two sca- 
les even though the differences on the latter avf considerab- 
ly smaller. \ 

According to the finding reported above that there is a cor- 
relation between the sex bias ani item discrimination, we 
might perhaps have 'expected to find a difference in favor of 
th^ girls on the HD items, which is obviously not the case. 
Three explanations can be put forth to account for this. 
First it should be^ noted that even tnough some items may be 
found to be biased , against one group there is nothing that 
s^ys .that this group will have a lower mean on these iterr.s 
since the mean is also affected by the distribution of per- 
son parameters-. Second it must l^e observed that if there are 
in thp test a few iteirs which are sever]y biased against one/ 
groupl seve'-'al of the others will appear to have at least a 
small bias -n favor of .^this group which follows from the 
fact that the constraint must be imposed that tfoe' parameters 
snail sum to zero. As the third explanation ,■ it can he pointed 
out that even- though we in this "case have irterrreteri the 
differences between the groups in terns of th*^ r,ex variable,,^ 
there is evidently in this sample p correl??tion between r>px 
and ability. This implies /hat the division of the sanplr 
according to sex to some extent is confounded with level of 
performance which in- turn implies that the correlation obser- 
ved between "sex bias" and item riiscrinination nay also be 
accounted for by differences in level of perfornance . ^ 

Let us now 'sum^narzze some of the poncl>fislons which can be 
drawn from these analyse^i. First of all it ran be ^^oncluded 
that the Rasch nod^ can be used to' study iteri bias, both as 
nuisance in measfi'ring devices and as a substantijve area Of 
researcn.(In order to prevent any mi sunderstfinding ^ronv ari - _ 
sing it shoulo perhaps be pointed out, that if aly'^-'^''^" In a 
test to the same degree favor one special grour. this wil"> ' 
be detected a« any , aviation from Ihe mod^l -annunpt ions . !'r 
the problen, rather Ig one of definition of thr ability bf^inf 
measured.) The model addfJ two asp'^^.tr, to the r>t,'idy oV. item 

86 



bias which are quite unique: the first that there exists an 
overall test of item bias, which is also supplemented by 
tests at the item level and the second that differences 
between the groups with respect to tne nerson narameters do 
not influence the results, at leaat not v^h^n* thore is no 
differences amonc the item paranetprs as a function of abili- 
ty- 

A second important conclusion to be drawn is that the overall 
numerical test of goodness' of fit presented in chapter 3.2 
have a low power of detecting certain threats to the model ;^ 
in this c^se mult idimensionality of the latent space since 
sex is an additional factor to ability v/hich systematically 
affects performance on some items. The inplication is of 
couroe that even though the overall test of goodness of fit, 
is riot significant there may be a need to carry the investi- 
gation further by division of the sar^ple along other lines 
than level of performance. / 

^.3 Discussion 



It must be stressed that the analyses of 'lumber Series and 
Opposites presented above are" nothing but examples, which can 
serve to highlight a fev; of the characteristics of the Rasch ' 

i 

model. In the analyses several scourae^5 of threats to the 
model have been pin-pointed and we will first discuss these 
and a few more common violations of'moriel assumptions. After 
that strategies and problem in the development of scales 
fitting the model are discussed. 

# 

Scources of threat against the mode l 

Item he^.erogeniety is. a vioiation of the assumption of uni- 
dimensionality and, as was pointed out in chapter 1.2 auo^/p, 
there is no entirely satisfactory r'et}u)d with which one can 
♦^ake sure that this assumption is not violated before the 
Rasch model is applied. But it does anpear as if the good- 
ness of fit tests (and especially thp graphic, tests of the 




87 



items) are powerful means with which item heterogeneity can 
be detected* In the analysis of Number Series, for example, 
the heterogeneity caused by some items requiring more arithme- 
tical ability than others was easily detected,' It thus appears 
that the model in itself is a very useful tool for studying 
unidimensionality of measurements. 

It must be strongly emphasized that the question of item ho- 
mogeneity is a question of finding items measuring the same 
ability, and not a question of excluding items not fitting 
the model. This implies among o*-her things that purely statis- 
tical criteria cannot be used in selecting items and that a 
very clear grasp of the cQriteiit_and the processes required 
of each items is demanded, 

Speededriess of the test is obviously a violation of the^ model 
assumptions since if 'a person does not have time to read an 
item any statement aLout the probaBility of a cor>rect answer 
as a function of the person parameter will be meaningless. 
Hone of the LT--models considered here can tlius be supposed 
to properly represent the case v;ben there is any amount of 
speededness i ivolved. It can be pointed out, howevjer, that 
Rasch (i960) has proposed a Poisson proppss model for on^ 
particular case where speed is.irvolved, namely tests of 
oral reading speed, • '-^^ 

, For^ almost al] group tests of ability th^ere is a fixed time 
limit (tnis also holds true for sor.r^ arniovonpnt te^ts) ^ 
which makes, ther at least in princinle speeded. But know- 
ledge about the time limit under which a test is administered 
does not say much about whether some persons v;ould answer 
additional items correctly given unlimited time; the omit- 
ted items may all have been so difficult as to ma'e the pro- 
bability of a correct answer very clos^ to zero. 

Thus whether a^jtest with a time limit is speeded or not i -^^ 
empirical question (in a sense this anpl^'es to te?ts given 
without time limits too since therf^ may bo se 1 r- inmoGed "tir^^ 
limits" which are consequences of bor^fion anH t, L r-odness ) nnd 



ERIC 



88 



it does appe.ar as if it is possible to investigate this 
question with the Rasch' model . Not only is the analysis of* 
Number ^SerieS:, presented above an example of this; Rasch 
(I960) was also able, to identify lack of fit for a test as 
being a consequence of the test beinp sneeded. 

But it should also be pointed out that i f j/ e know thai speed 
is the only violation of the assumittiions , ^he Rasch mode] nan 
be u'oed to ''partial out"' the speed factor. T^iis is effected 
through estimating the person parameter for the "power" from 
the scores obtained only on the attempted items. study 

i 

using such procedures has been presented by Allerup, %lov , 
and Spelling, 1977). 

Guessing can probably never be completely avoided but certain ^ 
kinds of items, i.e. multiple-choice it>ems, are of course es- 
peciaily likely to be affected by this e,xtranous factor. Un- 
less active attempts have been made to minimize guessing, the 
Rasch model (or the Birnbaum model) should be used only with 
•great caution when the items are of multiple-choice type, 
keeping in mind that^^^^e item paraiaeters cannot be expected 
to remain invariant over samples differing in levels of abili- 
ty. - . ' \ ^ 

Varying item discrimination is a kind of threat' to the validi- 
ty of the Rasch model which is quite- difficult to discuss 
since its implications are hard to identify at a more concrptp 
level. In chapter 1.1 the Flogging >^all test was used as an 
example, and it was pointed out that the discrimination para- 
meter of the canes corresponds to the amplitude of the^ flog- 
ging, i.e. to item reliability. 

It certainly is possible to imagine that different kinds of 

item have different reliability; items requiring a constructed 

response are for example usually more r'^^liahle than multipln 

choice items/ To take another example, T Li^dMad (1^77, \y^v- 

sonal communication) has pointed out that tests of listenlnf 

comprehension^, for measuring foreign language ^chiyemnnt tend 

to be less reliable than tests of reading comprehension and 

I 

89 



that the reason for this is probably that listening compre- 
hension^ tests are more susceptible to chance influences Uian 
reading comprehension tests. 

In r^ne sense it could be argued that those items in the number 
series test which were found to be influenced by arithmetical 
ability do have lower discrimination parameters, but since 
this can be explained with reference to a systematically 
working factor, it is better described as representing multi- 
dimensionality. Also in the analyses of Opposites we founr' 
that the items had different discriminative abilities which 
were among other things related to how much random guessing 
took place. But again, of course, it is basically a kind of 
multidimensionality which causes this to occurf^ 

It may of course be possible to find items which measure the 
same unidimensional ability and which are, to different 
degrees, affected by chance factors (i.e. have different 
discrimination parameters). I do suspeot, howpver, that in 
most case^ when what appears to be varying item discrimina- 
tion is found, a closer look will reveal that some kind of 
multidimensionality is involved. 

Item bias is also a kind of multidimensionality since in this 
case variables associated with a particular group make the 
items systematically too easy or too difficult. A good know- 
ledge of the items as well as of the sapple is of course 
essential to produce a fruitful approach to this problem. ' ^ 

Constrained responses and learning effects from one item to 
another are threats* to the model as well. Tf, for example, t 
four responses are derived from' a question requiring the 
pairing with respect to meaning of four given English words 
with four giver Swedish words those of the examinees Vho know 
three of the answers will automatically get the fourth pair - 
correct too, which obviously is a violation of the assumption 
of local statistical independence. 



so 



Learning effects from one item to another ar« 71018+- *o>>^ 
model assumptions of somewhat the sane kind. Such effects ray 
be very difficult to irientifv but it can be r.en^, iori<:- ^r^r^l \' 
is also possible to fioneralize tho '"asrh nodPl so that it o?n 
be used to study this problem specifically (see chapter 6,2). 

Person fit may be a problem too: idiosyncratic working methods 
cheating and carelessness (person reliability does appear to 
be at l^aet as fruitful a concept as item reliability) are a 
few such threats to the model. Some of these factors can be 
controlled out in the ac^jninistrat ion of the it^ms, ethers only 
through excluding persons. 

Wright and Mead (1977) have presented a test^ of person fit, 
based on analysis of residuals. There is also the possibility 
of constructing a theoretically satisfying test of person fit 
under the conditional approach). Since the probability of each 
observed^score V3ctor can easily be computed (equation 2.1.8), 
all that is needed to obtain a p-va]ue is €0 sum the probabi- 
lities of anymore extreme score-vectors (i.e. those with ]o- 
*v/er probabilities) than the observed. Tne test does, however, 
appear to be comput ationalLy complex, so it has not been 
implemented in the present version of P^^L. 



Strategies and problems in the development of^^Rasch scales 

The usual procedure in attempting to find p set of items 
fitting the model is to select, on the basis of a tryout, out 
of a" larger set of items thost. v;hich appear to fit the model 
and then, at best, cross-validate the' set on a"new sample. 
Such a procedure is reasonable enough (at least when the 
reasons for misfit are not to be found in factors other then 
item heterogeneity) but there are some risks involved which 
need to be discussed. It has already been pointed out that 
/ when item heterogeneity is -at issue the items do not fit 
the model but may fit each other. If a vast majority of the 
items measure the same ability, there being just a few de- 
viating ones', the latt<^r can easily be identified in the" 



graphic tests. But if there is more of item heterogeneity, 
the graphic tests are useless if used to select items which 
have a plot where the points fall close to the diagonal (and 
a statistical test iS^ even worse); in this case it is necesr.3- 
ry to keep an eye on ithe pattern of deviations common to se- 
veral items and if such a group of items shows similarities 
also in other respects such as content, it is reasonable to 
select those and investigate if they form a Rasoh scale. 

But this can at time be a risky strategy. Sometimes several 
threats to the'model are in opef^tion and the problem is that 
these can combine in different ways for different items and 
even cancel out. It is quite easy to imagine, for example, 
what would happen if guessing is allowed in a set of hetero- 
genous items; it would almost surely be impossible to get 
anything meaningful out of such an analysis. 

This indicates that when attempts are made to maximize i'.em 
homogeneity it is essential that all or most of the other 
sources or threat to the model assumptions are controlled For, 
which- is reasonably easy with respect to factors such as 
guessing' and speededness but i/nicn may be rrore difficult with 
respect to others. 

Another conclusion which is inevitable in this light is thnt- a 
very clear conception of the content of the items 'in the try- 
out is required if a meaningful selection and/cr classifica- 
tion .of items is to be made. 

I 

0 

De cree of fit and inferential tesf - 

.There are problems involved in using statistical tests to de- 
cide whether the data show an acceptable fit or not. One 
problem is tnat even though one test mny indicat*- a good fi*". , 
another may indicate a very poor fit. Another problem is rela- 
ted to^sample size; when large samples are; ur.'-d v^ry small 
deviations will recult iiT significant values on the test sta- 
« tistic and when small samples are use^l even groas ^le viat i onr, 
may. remain undetected. The first problem can to some degre-- 



92 



be solved by use of the measure of redundancy (chapter 3.3) 
but how large a sample is required to obtain a reasonable 
power in the statistical tests is as yet an unresolved prob- 
lem. 

These problems indicate that not too much weight /should ^e 
placed on the inferential tests of goodness ot fit and 
especially not when the sample sizes are extreme in pither 
direction. Less formalized approaches \;hus appear to be 
necessary complements in evaluating fit. The graohir. tests of 
items are here valuable and content-re lated cc^nsiderations 
are indispen&ib le . 

But it must also be pointed out that the degree of fit which 
is necessary to some degree depends upon the applications in- 
tervded. For sor.e .applicat^ions , such as the study of unidinen- 
sionality, we can accepu only small deviations, but for others, 
such as perhaps more technologically oriented applications, 
might expect to get useful results even when the fit is not 
the best possible. It does in fact appear to be a very impor- . 
tant area of research to study now much the irodel assumptions 
can be violated without j eopardi ^.ing different kinds of appli- 
cations of the model. 



The concept of unidimensionality 

The notion of unidimensionality is essential in all the IT 
models but particularly so in the Rasch model since there is 
no possibility of treating different kinds of mult idimensio- 
nality as varying item discriinination in this model. Thpre z 
thus reason to take up the notion of unidimonnionali ty to 
special discussion. 

In my opinion it is as yet an unanswered question wh'at pro- 
perties those scales fitting the Rasch model hav"' fron a 
psychological perspective. Are they for example so narrov; 
and specific that they will be impractical to use? ''y own 
impression which is, however, only based on analyses of tests 



ERIC 



93 



/ , originally constructed within the framework of classical test 

theory is that the Rasch model is extrenely sensitive to any 
kind of multidimensionality and that the scales thus tend to 
^ be quite narrow. It does appear to be a research question of 
the highest priority to investigate the "psychological width" 
of item sets which do fit the model and to study how one 
should proceed if it is found that they in fact tend to be 
very narrow. 

But be as it may with this question; the notion of unidimen- 
sionality is nevertheless of utmos': importance in any attempt 
to make measurements. Some arguments in favour of this view 
have already been presented (paje 9) but ther/e is reason to 
emphasize once again the central importance and great use of 
the concept of unidimensionality . , 

As has been pointed Out by Lumsden (1976) the requirement that 
tests should be unidimentsional has been seriously neglected^ 
in classical test theory, which is probably partly due to the 
fact that there has existed no satisfactory method for study- 
ing unidimensionality but prpbab iy so to the fact— t+;at rea- ^ 
sonable degrees of success in practical applications have 
been obtained without imposing this requirement. 

But whenever anything more than some degree of correlation 
with an extranous measure is to be achieved, the assumption / 
of unidimensionality is essential. Lumsden (3976) stressed "/ 
that measurement is always measurement of an attribute or a ^ 
property (a latent trait) so it may be asked; "How can we 
make any claims to measure if '^our measuring instrument has a 
/ ^ number of different sets of items based presumably on diffe- - 
rent attribute conceptions?" (p. 266). 

To construct a test intended to measure an attribute we of 
course need a conception of the attribute at once when th<^ 
work is begun. But this conception is likely to be vagup nn-^ 
*there will be little basis for deciding whether an item or an 
item type does reflect the attribute. But through a continuing 
process of revision of the conception of the attribute and re- 
vision of the items used to measure the attribute we are likely 

ERIC ' 94 



to obtain a better understanding both of the attribute and the 
measuring device. In such a process of revision the Rasch mo- 
del can be supposed to contribute greatly, even though it is 
of course not the only method to be used in such work. 

The notion of unidimensionality implies that only one attri- 
bute should be measured with the same test, but" it does not 
imply that the latent trait in itself is unidimensionalj it 
may well be functionally (and factorially) complex and we can 
certainly not. claim that there is one unitary process under- 
lying test performance. (But there are in fact developments 
of the Rasch model which are well suited to the study of what 
kind of processes contribute to the difficulty of items, see 
chapter 6*2). 

Let me give one more example showing the importance of uni- 
dimensionalityj. In experimental educational research it is 
common practice to administer to groups given different 
treatments the same post-test , and then compare the outcomes 
in the treatments in terms of the means of raw scores obtained 
on the post-tpst. But if there are interactions between treat- 
ment and outcomes so that the difficulties of the items in 
the post-test vary as a funct-on of treatment such a compari- 
son can only ^produce more or less meaningless results. In such 
a case we would want to reorganise the items in the post-test 
into internally homogenous scales which measure the same 
thing in all treatment- the Rasch model can easily be applied 
to accomplish' this. 



89 



Chapter 5 
SOME AREAS OP APPLICATION 



In the preceeding chapter it was pointed out that one very 
important area of application of the model is to study the 
internal workings of a test. But it is also true that once 
scales fitting the model have been developed it is possible 
to solve within the framewdrk of the Rascli model a number of • 
measurement problems. We will briefly indicate some of these 
possibilities. 

i 

. I 

. 5*1 Test? optimation 

The problem of how a test should be\ organized in terms of uwa 
her of items, level of difficulty and spread of item difficul 
ty in order to obtain a suitable precision of measurement can 
rather easily be' solved using th^ information function with 
respect to the person parameters (chapter 2.5). 

In the Rasch model the information with respect to a person 
parameter (and the item parameter) contained in the response 
to an item is a fur^ctlon only of the probability of a correct 
answer which is easily seen if we rewrite (2.3.2) slightly: 



(5.1.1) ^l^^v^^Pvi^^-Pvi^ 



In Figure 5.1 Ii(^v^ item is shown as a function of" 

the probability of a correct answer. The maximum of the curve 
is where P^j^^-y but we can also note th^at the information 
obtained is ^i^e^tively constant within the range .SOsp^^i.SO. 



96 



Prom these properties of the model follows that the only 
factors affecting the precision of measurement at any given 
level of ability is the number of items in the test and the 
distribution of item parameters. But it also follows that the 
standard error of measurement varies as a function of ability, 
which^jc^n be illustrated with some examples ♦ 

For two tests, both with MO items, the SEM(C) has been plotted 
against C in Figure 5.2. One of the tests (peaked) contains 
items which all have the same parameter (a^=0) and in the 
other test (spaced) the item parameters vary between -3 and 
3 in equal steps. We see that the peaked test gives a higher 
SEM(0 for extreme person parameters while it gives a lower 
SEM(C) for the intermediate range of abilities. 




-3 



-1 



Figure Standard errors of measurement of ability as a function of 
ability for two hypothetical tests. 



ERIC 



98 




Figure 5.1 The information in an item as a function of probability of a 
correct answer. 



Prom chapter 2.3 it is recalled that the information in a 
test with respect to a person parameter is the sum of the in 
formation contributed by each item: 



(5.1.2) h^^v^'- i Pvi^^'Pvi^ 



i = l 



and it will also be recalled that the standard errdr of 
V measurement SEM(5) is: 



(5.1.3) SEM(0 = 



ERIC 



f 



Prom these properties of the model follows that the only 
factors affecting the precision of measurement at any given 
level of ability is the number of items in the test and the 
distribution of item parameters. But it also follows that the 
standard error of measurement varies as a function of ability, 
which^^n be illustrated with some examples. 

For two tests, both with i40 items, the SEM{0 has been plotted 
against C in Figure 5.2. One of the tests (peaked) contains 
items which all have the same parameter {a^=0) and in the 
other test (spaced) the item parameters vary between •"3 and 
3 in equal steps. We see that the peaked test gives a higher 
SEM(5) for extreme person parameters while it gives a lower 
SEM(t) for the intermediate range of abilities. 




Figure 5.2. Standard errors of measurement of ability as a function of 
ability for two hypothetical tests. 



ERIC 



98 



The highest precision of measurement is of course always ob- 
tained for any given level of* ability when at that level the 
probability of a correct' answer is .50 for all the items. We 
can thus formulate the very simple rule that when the purpose 
is to m^-^asure just one level of ability, items should be se- 
lected which have the same parameter value as the ability to 
be measured. The number of items needed (k^) to obtain any 
wanted precision (SEM^) is of course easily determined: 

1 

(5.1.4) k = ^ 

.25SEM/ 

Mostly, however, a test is intended for use over a range of 
abilities and to reach any statement about how a test should 
be built up it is necessary to make assumptions about the 
distribution of person parameters. If we take a look at Fi- 
gure 5.2 again in this light we find that any of the two 
tests can have the lowest mean of standard errors and thus 
have the best subject separation (or, equivalent ly , have the 
highest reliability). If, for example, the person parameters 
are distributed normally with zero mean and unit variance we 
would find that for more than 90Jl of the persons in the 
sample the peaked test has the lowest SEM(C) and would con- 
sequently yield the best subject separation. When the ISS's 
for the two tests were computed under these assumptions the 
values found were .88 and .85 for the peaked and spaced tests 
respectively (the corresponding values of KRgg were .89 and 
.85). If> however, we assume another distribution of person 
parameters such as a rectangular one or a normal distri^'jut ion 
with a standard deviation which is considerably greater than 
unity it is easy to see that the peaked test will give an ISS 
lowe,r than that for the spaced test. 

The problem of how items with different parameters should be 
chosen so as to obtain maximum precision of measurement (in 
terms of the mean of the standard errors) has been studied 
in great detail by Douglas (1975) and Wright and Douglas 
(1975). They found that when the sample has a normal distri- 
bution of person parameters a peaked test centered on the 



ERIC 



99 



mean of the sample is optimal when the standard deviation (s) 
is not larger than J.SS-l.SO but that for samples with grea- 
ter 8 uniformly spaced item difficulties should be used. For 
example, when 8=1.75 an optimal difference of 6 between the 
highest and lowest item parameter was found (for this diffe- 
rence the term width, W, was used, so here W=6)* 

For rectangular distributions of person parameters lower va- 
lues of s were found where a change from a peaked to a spaced 
test is motivated, the limit being around s=.75. Compared to 
the normal distribution a rectangular distribution of person 
parameters requires a greater spread of item parameters for 
the same s to obtain the best precision. For examnle when 
8=1.75 the optimum was found at W=10 for the rectangular 
distribution. 

Wright and Douglas (1975) have summarized their studies in 
some simple rules for test construction: they do advise, for 
example that uniformly spaced item parameters with W=Us, 
where s is the best guess of the standard deviation in the 
sample, should be used. But it must of course be realized 
that use of such simple rules implies that some accuracy 
is sacrificed. 

It is of some interest to compare the conclusions about opti- 
mal test design drawn here with those recommendations issued 
within the framework of the classical test theory ♦ It has 
long been known that a test with uniform item difficulties 
(with a proportion of correct answers of .50, when no guessing 
is allowed) generally has a higher reliability than a test in 
which the item difficulties are spaced (e.g. Gulliksen, 19^15; 
Lord, 1952). Another conclusion which has been drawn is that 
a better reliability is obtained if items with a high relia- 
bility, as measured for example with the biserial or point 
biserial correlation, are selected. But it has also been noted 
that for a peaked test there is an optimum item reliability 
beyond which the reliability of the test decreases; this is 
the so called attenuation paradox (e.g. Loevinger, 195^). 



100 



The explanation as to why the attenuation paradox occurs is 
quite simple if stated in general terms: If the reliability 
of all items is very high, the correlations between all items 
will approach unity (if the test is unidimensional) , which 
means that a person who passes one item will pass all the 
others and that a person who fails one item will fail all the 
others. The distribution of scores will thus tend to be bimo- 
dal with a very good discrimination at one level of ability 
but with virtually no discrimination between examinees at 
other levels of ability. The attenuation paradox occurs only 
if the items all are of the same difficulty and the solution 
of the problem is, of course, to use items with spaced diffi- 
culties (e.g. Brogden, 19^6; Cronbach i Warrington, 1952). 

The conclusion was drawn above that when the variance of the 
person parameters in the sample is small a peaked test should 
be used, otherwise not. This conclusion is in fact identical 
to the solution of the problems caused by the attenuation 
paradox which follows from the fact that with a higher, for 
all items common, discrimination there is in the Rasch model 
a higher standard deviation of the person parameters. 

In fact the real explanation of the attenuation paradox is of 
course that since the standard errors of measurement are lar- 
ger for certain scores than for others, constructing the test 
so that for a sample it results in many scores which have a 
.large standard error will have detrimental effects on the 
reliability. Thus what in classical theory is a paradox follows 
in the Rasch model (and all the other LT models) naturally 
from the fact that the standard errors of measurement vary as 
a function of ability. 

5>2 Tailored testing 

It is obvious that the strategy of giving the same set of test 
items to persons of all levels of ability will necessarily 
result in different precision of measurement at different le- 
vels of ability. The only possibility of obtaining standard 



101 



errors of measurement which are equal over a range or abili- 
ties is to give different items to different persona, rr^-r^ — 
tailored testing. 

The LT models are of course extremely well-suited for tailo- 
red testing since it is possiUle to estimate on a common abi-' 
lity scale results obtained by different examinees on diffe- 
rent items. The next section demonstrates how such a transla- 
tion into a common metric can be effected with the Rasch mo- 
del. 

The basic principle is of course th^t all the persons should 
take items on which they have a probability of .50 of giving a 
correct answer. Ususally computer based administration of the 
items has t;o be used and there are different strategies by 
which items can be selected from a pool so as to keep as close 
to this requirement as possible (Lord, 1971, 197**). There is 
of course additional use of the computer when, after the tes- 
ting, the scores on the items are to be translated into the 
metric of the latent trait. 

^Wright and Douglas (1975) have, however, presented a system 
for self-tailored testing based on simple approximations in 
which a computer need not be involved either in selection of 
items or in computing person parameters: ^ 

"The person 'to be measured can be handed a booklet of 
test items more or less equally spaced in increasing 
difficulty from easiest to hardest and invited to choose 
any starting place in the booklet with which he feels 
comfortable. From that self-chosen starting point the 
examinee can work at his own will and speed in either 
direction, forward into harder items or backward into 
easier ones, until he reaches his own performance limits 
or runs out of time. Whatever the level and length of 
the self-chosen segment, all that are needed to obtain 
an objective item-free person measure and its standard 
error are the serial numbers of the easiest and hardest 
items tried and the number of successes in between. 
These three observations are sufficient to look up in a 



ERLC 



102 



simple series of tables the person's estimated measure 
and the standard error of that estimate. (Wright & 
Douglas, 1975, p. 13-'<1). 

As wa» mentioned above this system is based on certain approxi 
mat ions and it is easy to imagine practical problems in its 
application; it is, however, possible that it might work so 
well that it can be profitably exploited, 

5»? Test equating and linking 

One area of great potential for applications of the Rasch mo- 
del is equating and linking of tests, i.e. expressing on the 
same scale raw scores obtained on different tests (or sets of 
items ) . 

Test equating 

If it can be confidently assumed that two (or more) tests 
measure the same trait, equating of scores is a very simple 
tasK, which is illustrated below. However, since the assump- 
tion that the tests measure the same ability is critical we 
will first adress the problem of how to test this assumption. 

Let us assume that the te^ts have been given to the same samp- 
le and that separate analyses of the tests have indicated a 
good fit tovthe model. If tl^e tests measure the same ability 
then we must\lso find a good fit if all the items are analy- 
zed together. This straightforward approach of testing the 
assumption is conceptually simple, but it may be impractical 
since when all the ^tems are pooled, a very long test may be 
the result and it will be recalled that the overall numerical 
tests are cumbersome ^ compute when the number of items is 
large. Fortunately ther^ exi3ts a likelihood ratio test which 
directly tests the hypoth^^si^s that the two sets of items 
measure the same ability (Martin-L8f, 1973, P. 135-136). This 
test calls for some hand computations (or a short computer 
program) but requires otherwise only that the parameters are 
estimated for each test and for the pooled eet of items. 



103 



Let us call the number of Items in the two tests and 

with kski^k^. We define further n^ ^ to be the number of 
1 2 ♦ 12 

persons^ wi til raw score r^ on the first test and raw score r^ 
on the second test. Let yfi^ be the maximum value 'of the 
logarithm ot the likelihood function (3.2.1) and H^^ and 
the corresponding^ values for each test. Martin-L5f has 
then shown that the test .statistic is*: 



< (5.3.1), f 

1 2 '^r * n 

logX- I I n log^'^^ E n^log^.H^-.H,^H2 

r^^O rg^O ^ P r=0 ^ 



and that -21ogX is approximately chi-square distributed with 
kj^kg-i degrees of freedom when n-^«^. 

The values of H^, H^^ and are obtained on the computer 
r • 

printouts from the corresponding analysts. The values of the 
other terms appearing. in (5*3.1) can be obtained either 
.thrQugh hand/calculations based on the bivariate and uni- 
' va'riVte distributions of test scores, or by writing a special 
program to perform these simple but sometimes tedious tas4cs. 

The test presented above can be expected to be ofj use not on- 
ly in testing the homogeneity of two distinct setp of items 
intended to be used as separate tests but also when very long 
tests are constructed. Since in such cases the overall nume- 
rical tests of goodness of fit are out of reach, at least; if 
one is operating^in an environment where computer time is of 
limited Supply, a good strategy may be to develop out of the 
same pool of items two tests fitting the model and then in- 
vestigate whether they can be put together into one long 
test. 

Lef's now turn to the problem of equating raw scores obtained 
on different tests. It will be recalled that in any, estima- 
tion of the item parameters a constraint must be imposed, for 



ERIC 



104 



example that the sum of the item parameters expressed on th^ 
log scale is zero, which efi*ects a fixation of the origin of 
the scale. The ability scales associated with two tests mea- 
suring the same ability are thus the same except for the 
arbitrary origin of the scales. But if the two tests are gi- 
ven^tTb the same sample we are in the position to estimate the 
difference in origin of the scales since, of course, the sanje 
sample must have the same mean of ability whichever test is 
used. 

There are two methods which can be employed to estimate the . 
difference in origin of ability scales, both resulting in a 
simple additive constant tOfbe used as a correction- factor ^ 
(see e.g* Kifer, 1976 i Rentz & Bashaw, 1975). The first 
method, the so called "ability method" simply consists of cal- 
culating the difference in the means of ability estimated 
from the two tests and using the obtained dif<*erence as the 
correction factor. In the other method, the so called "dif- 
ficulty method", the item parameters from both tests are esti- 
mated together and the difference between the means of the 
estimated item parameters is used as the ^corr^tion fajkor. 

These two methods give theoretically identical results bu^ 
there is at least one thing that speaks in favour of the dif- 
ficulty method: since the person parameters cannot be esti- 
mated for zero or perfect raw scores (such persons are exclu- 
ded from the analysis) the ability method must not be used 
whenever different persons obtain such scores on the two 
tests. 

The difficultly method will here be illustrated with some ge- 
nerated data* For a sample of 1 000 persons, distributed 

N(0,1), scores were generated for 40 items, of which 20 had ^^y/ 
the parameter -1 and PO the parameter 1. It will be supposed 
that these two groups of items can be given as two forms, one 
simple and one difficult, and that we above all are interestpict--- 
in knowing which raw score on the simple form corresponds to 
which raw score on the difficult form. 



105 



The test of the homogeneity of the two forma gave x -302.62 
with 399 degrees of freedom so it is obviously no problem to 

'do the equating. Not surprisingly the difference between the 
means of the parameters of the simple and difficult items, 
sets turned Out to be -2 in the analysis. This value of -2 is 
the correction factor which of course means that we shall sub- 
tract 2 from? (6r rather add -2 to) the ability scale for the 
simple items to get the corresponding location of the ability 
scale for the difficult items. Table 5.1 presents the table 

■of conversion from raw scores on the two forms into the abili- 
ty, scale of the difficult form. 



Table 5.1 Person parameters expressed in the metric of the 
difficult test for raw scores obtained on the 
simple and difficult forms. 



nciw sc ore on tne 
difficult form 


Person 
parameter 


Raw score on the 
simple form 


Person 
parameter 


1 


-2.9k 


1 


■ - ■ 


2 


-2.18 


2 , 


-4.18 


3 


-1.73 


3 


-3.73 


i| 


-1.39 


k 


-3.39 ' 


5 


-1.10 


5 


-3.10 


6 


-.85 


6 


-2.85 


7 


-.62 


7 


-2.62 


8 




8 


-2.111 


9 


-.20 


9 


-2.20 


10 


.00 


10 


-2.00 


11 


.20 


11 

* 


-1.80 


12 




12 


-1.59 




.62 


13 


-1.38 


• in 


.85 


11 


-1.15 


15 


1.10 


15 


-0.90 


16 


1.39 


16 


-.61 


17 


1.73 - 


17 


-.27 


18 


2.20 


18 


.20 


19 


2.9^ 


19 


.9'* 



Prom the figures presented in the table it is easily under-, 
stood that before the conversion was made the scale of abili- 
ty for the simple form was numerically exactly the same as 
that for the difficult form. The reason for this is of course 
that separate analyses of the t^ items sets produce exactly 
the same item parameters, (they are all equal to zero, more 
or less, as a consequence of the normation) so consequently . 
there can be no difference between the numerical values of 
the person parameter corresponding to a certain raw score 
(but the distribution of person parameter is of course radi- 
cally different). 

Using linear interpolation methods a graph has been construc- 
ted (Figure 5.3) to show the relation between raw scores on 
the two forms, i.e. using the common scale of ability, raw 
scores on the simple form have been translated into raw 
scores on the difficult form. Obviously there is a curvi- 
linear relationship between raw scores on the two forms. 




6 7 8 9 tb 11 12 13 14 , 15 16 17 % 19 

SIMPLE 



Figure 5.3. Raw scores on the simple and difficult forms correspond! 
. to the same level of ability. 

107 



It Bhojild jerhaps be pointed out that if we did test a sample 
with both fonns we would not find this particular curvilinear 
relationship, between raw scores on the two forms even for ve-r_ 
ry large samples. The reason for this is that there is a re- 
gression towards the mean, i.e. those examinees with bad 
(good) luck on the simple form can on the whole not be expec- 
ted to have an equally bad (good) luck on the difficult form 
and vice versa. Tfie conversion should thus not be interpreted 
to mean that it gives the expected raw score on one form gi- 
ven the raw score on another formj rather it tells what raw 
score would have been found if the other fonn had been used 
instead, everything else being constant. 

e 

Test linking 

Also in linking tests the purpose is to estimate on the same 
seals results obtained on different tests but in this case the 
tests are given to different samples j the linking is made 
possible through use of a subset of, say 10-20 items common to 
both (or all) tests. 

A version of the d^'fficulty method described above is used, in 
that the mean of i item parameters fof* the common subset is 
estimated in the context of each test. The difference between 
the means of the estimates of the parameters indicates of 
course the difference between the origins of V-he scales of the 
item parameters in the two tests and can be i ^ed as a correc- 
tion or translation factor. Thereafter the ability scale 
associated with the "translated" item parameters must be com- 
puted (handy computer programs which perform this task can be 
found in Wright & Panchapakesan, 1969; Kifer, Mattsson & 
Carlid, 1975; Uentz & Bashaw, 1975) which makes it possible 
to translate into the ability scale of one of the tests, raw 
scores obtained on the other. 

No example of how this can be done in practice is presented 
here; the reader is instead referred to Wright (1977) for 
further details and more elaborat^f* linking designs and to 
Kifer, et al. (1975); see also Kifer, (1976), as they do 
present an easily followed example. 

108 



It need probably not be §aid that no linking should be attemp- 
ted unless the tests measure the same ability. The fact that 
the tests do have a subset of items in common of course makes 
it possible to test this assumption: if all tests fit the mo- 
del they must in fact measure the same ability. 

Item banks 

Virtually all the applications discussed in this chapter pre- 
suppose that there exists a pool of items measuring the same 
ability and for which items the difficulties have the same 
origin of scale. It^is obvious than when such a pool of -items 
is at hand a large range of. measurement problems can be solved 
tith great efficiency and simplicity; tests can be optimized 
for specific purposes and tailored testing be comeSx possible. 
Furthermore, all possible tests which can be constructed by 
selection of items from the pool are automatically equate'd 
(even though it is of course necessa^ to compute the associa- 
ted ability scale for each selection of items so that the ob- 
served raw scores can be translated irtto the common metric). 

The most effective way of developing item banks is of course 
to successively link new items into the bank/ using the pro- 
cedures of test linking described above. But it is important 
that an eye is kept 6n the. fit of'the'items throughout: a bank 
consisting of heterogeneous items with a poor fit is probably 
worse than no bank at allithe strong claims which can be ad- 
vanced in relation to the Rasch model are valid when the model 
holds true, otherwise not. 



4 



* 



109 



Chapter 6 
GENERALIZATIONS OF THE RASCH MODEL 



This report treats in detail on\y the simplest case, i.e. in 
which the model specifi-es only two parameters and there are 
only two categories , of answer. (Even though the wording has 
been phrased in terms of measurement of ability there is of 
course nothing that says tl^at the juodel cannot be used to / 
measure personality, attitudes and so on). There are, however, 
developments of the .basic mode]^ which can deal with more 
complex situations and the parameter structure of the model 
can be transformed in different ways. Sonie of these generaliza- 
tions of the model will be briefly mentioned below. 

s 

6.1 The polychotomous case - 



It is possible to generalize the model to treat the case where 

there are more than two categories of answer, as is for example 

often the case in attitude questionnaires (Andersen, 1973i 

Fischer, 197^, p. ^24 ff.;Allerup & Sorber, 1977). 

Instead of observing whether a particular answer is correct or 
incorrect we observe which particular answer category 
(h, h=l,...,m) a person v endorses on item i. We can represent 
the answer by using a selection vector ( A^^) = ( A^P , . ,A^^^ 
( m ) 

,,..,A^^') which contains zeroes for all the alternatives 
not chosen and a one for the category endorsed. If there, for 
example, are three categories of answer and a person choses 
the last for a particular item this is represented with the 
selection vector (0,0,1). 

Instead of one person parameter there is in the polychotomous 

case a vector of person parameters, the elements of which each 

indicate the tendency for each persons to chose each alterna- 

(1) (h) (m) 
tive: (e^)= (e^, . . . ,e^, . . . ). in the same way there is for 

110 



each item a vector of parameters representing the tendency 



for eaph alternative to be chosen: (e^)s (e[^^ . . . ,e^^^ . . . ,e J 



(h) 



(m) 



We need, however, to impose a constraint on these vectors' of 
parameters and we can use e^""^ and e^""^ for unity normation, 
i.e. they are put equal to unity. We can then write the basic 
model in the following way: 



e(l)^(l). 
P(A(^l).l|v,i)=^L_i_ 



'(6.1.1) 



14 E e(h) c^.^^ 



h=i 



e(h) (h) 
P(A(^^-.l|v,i)= ^ i 



1+ 



h=l 



0(h) (h) 
V '•i 



P(A^f = l|v,i) = 



m 



h=l 



Thus, the ICC is for each answer category here multidimensio- 
nal and there are m-1 dimervsions. But of course the. notion of 
unidimensionality is as important here as everywhere else so 
it may be asked whether the multidimensional model may, in 
fact, be reduced into a unidimensional one. This can be done 
if it is possible to find a unidimensional vector of item 

parameters ( ) , i = 1, . . . ,k and a *'^ring-vector" ( ^ ,h = l, . . >^m) 



so that for all items lege 



There are great technical complexities in obtaining CML^ esti- 
mates of the parameters. Allerup and Sorber (1977) have, how- 
ever, presented such a computer program, based on methods for 
computing the symmetric functions and solving, the equations 
suggested by Andersen (1972). This program also tests the hypo- 
thesis that the multidimensional model can be reduced into a 



ERLC 



Hi 



unidimensional one and provides also the necessary informal 
tion for performing goodness of fit tests. There do also 
exist approximations to the strictly conditional approach: 
Pisqher (•197'», P* 571) has presented such a program for the 
casfli where there are three categories of answer, and methods 
for 'obtaining unconditional estimates have also been developed 
(Anririch, 1977). Examples of applications of the poly chot omous 
Rasch model have been presented by Fischer (1974, p, <I78 ff.)* 

•> 

6,2 The linear logistic model 

In the basic Rasch model there is one difficulty parameter for 
each item; it is, however, possible to construct models with 
another parameter structure. A very interesting model results 
when the^ item parameters are replaced with a smaller number 

of "basic parameters" ( n,- , J = 1 , . • • j^") representing, for example, 

J 

hypothesized processes which appear with different frecjuency 
in different items. By specifying one parameter for each pro- 
cess and the frequency with which it has to be carried out, 

the difficulty parameters can be "explained". We thus want to 

m 

investigate the hypothesis that logG.= E f^^-n.-, which can be 

j=l N 

be made empirically when the matrix of frequencies ( ( j ) ) 
has the rank m, and when m<k, 

. The model has been presented in detail by Fischer (197^, P* 
340 ff.; a computer program is also presented); Fischer 
(1974) and Lybeck (1974) discuss some very interesting possib- 
le applications in an educational context. It should be poin- 
ted out, however, that Kempf and Niehausen (1976) have criti- 
zed this approach on the basis of lack of interpretability of 
the "basic parameters". They suggest instead* that error types 
should be analyzed with a polychotomous model. 

Dynamic models in which "transfer effects" are represented 
with special parameters have also been proposed and used 
(Spada, 1976; Kempf, 1976; Kempf, Niehausen & Mach, 1976). 
Such models can be used to investigate learning effects from 
one item to another as a threat to the validity of the basic 



model, but can of course also be used to investigate substan- 
tive problems of great interest. 

6,3 Analyses of experimental data 

The linSear logistic models mentioned above can be used to 
analyze data from experimental studies (see e.g. Kempf et al., 
1976). But as has been pointed out by Fischer (197i», p. 506) 
it is also possible to formulate linear logistic models re- 
sembling the analysis of variance model, i.e. with parameters 
representing treatment and interaction effects of different 
kinds. Such models would entail one single assumption (which 
is also empirically testable), namely that there is an additi- 
ve or, equivalently, a 'nultiplicative' relationship between the 
parameters, and they would fill a deeply felt need for sound 
statistical models for the analysis of qualitative data. 



t 

' lis 



Chapter 7 
THE PML PROGRAM 



The computer program is written in FORTRAN IV and was develo- 
ped on the IBM machines (36O/65 and 370/1^*8) at GUC (Gothen- 
burg Universities'Computing Center) • The program should, how- 
ever, only to a small degree be machine dependent (one ver- 
sion of the program at least) so it can probably relatively 
easy be implemented on other machines, 

7>1 The two versions of PML 

There are two versions of the program: one OSIRIS version 
calling routines in the OSIRIS III (1973) subroutine library 
and a non-OSIRIS version (or rather a simplified OSIRIS ver- 
sion) in which all routines called are included in the source 
code. The OSIRIS version can of course only be used at compu- 
ter installations where the OSIRIS system is implemented. 

The OSIRIS system has four important advantages: 

- A self descriptive data structure is used, i.e. for each data 
file there is an associated dictionary file containing 
descriptions of the data file such as variable numbers, 
variable locations and names of the variables. This implies 

.that the variables (items) can be referred to with a variab- 
le number which remains constant from analysis to analysis 
and that the variables are easily identified on the printout 
since they have a unique name. 

- Specification of the control parameters for each run is easy 
since keywords are specified in a completely free format. 

- Selection of any subset of cases is easily effected through 
the special filtering feature. 

- Since the input routines are coded in Assembler they are 
very fast, 

114 



In the non-OSIRIS version of the program some of these advan- 
tages are lacking: no filtering is possible and fixed format 
specification of a few of the control parameters is necessary. 
However, to maximize the similiarity between the versions, 
and to gain some of the advantages of OSIRIS, a simplified 
OSIRIS structure has been created (this work has been done by 
Jan-Ounnar Tingsell at the Department c;* educational research, 
University of Gfiteborg) in which a simplified dictionary file 
is used along with the data file (see below). 

Thw two versions of the program thus differ with respect to 
the input routines used; in the analysis parts of the programs 
there are no differences. 

7.2 Obtaining a copy of the program 

The scource code punched on cards (or written on a tape sent to 
me) may be acquired from the Institute of Education, Univer- 
sity of GSteborg by writing to the present author. A fee is 
charged corresponding to the price of the cards and the costs 
involved in handling and ^hipping. Please indicate whether ^ 
the OSIRIS or the non-OSIRIS version of PML is desired. 

7>3 Using PML 

Since the control information needed for the two versions of 
the program* is aomewhat different and is specified in diffe- 
rent ways, the instructions for use will be specified separa- 
tely. Some advice about choice of options is also given below, 
but only in connection with the OSIRIS version. 

How to use the OSIRIS version 

The control cards for the OSIRIS version are specified 
according to the standard OSIRIS III (1973) syntax and ther. 
> is no need to describe the details here. Three or four state- 
ments (mostly corresponding to the same number of cards) are 



ERLC 



1X5 



necessary as input: 

1. Filter statement (optional) 

2. Title card (80 character^ of information to labe: the output) 

3. Global parameters (selected from the 15 parameters described 
below) 

1. Variable list. 

The global parameters are selected from those described below 
(defaults are underlined) 

PRINT=DICT/N0DI DICT: Print the dictionary 

NODI: Do not. print the dictionary 

DESC/ NODE DESC: Only descriptive information (e.g. 

proportion of correct answers, point- 
biserial correlations, and the item by 
score group frequency matrix of correct 
answers) is supplied without any estima- 
tion of ^tem and person parameters. 
This keyword can- be specified to make 
sure in an economical way that there 
are no items with a very high proportion 
of correct answers (which causes a slow 
convergence). Another usage is to have 
a look at the ((n^^,)) matrix in order to 
specify a suitable minimum group size for 
the Andersen test ^see page ^9 above). 
NODE: A full analysis, according to the 
other options chos'en performed. 

MAXI=N The maximum number of iterations in the 

estimation of the item parameters. The 
default is N=250. If . convergence has not 
been obtained within the specified num- 
ber of iterations PML will assume that 
this has occured when MAXI is reached 
and will continue with the other tasks 
set up. The maximum number of iterations 
in estimating the person parameters is 
taken as i4N. 



H6 



The accuracy required in the estimation 
of item and person parameters in terms 
of number of decimal places. The default 
is N53» For some purposes a lower accura- 
cy cam be demanded but certainly not when 
the overall numerical tests are to be 
computed* The variance- covariance matri- 
ces which are inverted in^the computation 
of the Martin-Lttf test (see page 51 f f . ) 
may for example not be positive definite 
when accuracy is too low. 

BIPP: The symmetric functions are computed 
with the Difference algorithm (see page 
31). Since this algorithm is sensitive to 
roundoff errors it should not be used 
when the number of items is large and/or 
there is a great . range of item parameters. 
This algorithm seldom works when k>^0 and 
it seldom fails when k<20. It should be 
pointed out, however, that even though 
this algorithm may work well in ^estimating 
the item parameters for the whole sample 
it may Vreak down when the Andersen test 
is computed. When this test is requested 
this algorithm should thus be avoided un~ 
lesi k<20. There is no risk, however, of 
getting wrong results as a consequence 
of roundoff errors since the program is 
stopped when computational accuracy gets 
too low. 

SUM: The symmetric functions are compu- 
ted with the Summation algorithm (see 
page 32). This algorithm works in those 
cases in which the Difference plgorithm 
fails but it is somewhat slower. 



This keyword is effective only when 
ALGO^SUM is chosen. 

STNG: The symmetric functions are compu- 
ted v^rith single precision arithmetic. 



117 



This- keyword should be chosen only when 
it is essential to keep the amount of ' 
computer time to a minimum. Observe that 
there is no test of computational accu- 
racy in the SUM algorithm, 
DOUB: The symmetric functions are compu- 
ted with double precision arithmetic. 

APPR: The approximation suggested by^^ 
Martin-L<Jf ( 1973, see page 3^ above') is 
used to compute start values for the ite- 
rations. This keyword can be chosen re- 
gularly. 

UNIT: Unities are used as start values 
for the iterations. 

N= the variable number of the item chosen 
for unity normation. The default is the 
item of medium difficulty. 

EXTR: The Aitken extrapolation (see 
page 35) is used to speed up convergence 
of the iterations. This df '^•lult value 
can be used regularly but if the itera- 
tions should diverge the extrapolation 
may be the explanation. 
NOEX: No extrapolation is done. 

PERS: The person parameters are estimated. 
NOPE: The person parameters are not 
estimated. In a process of item selection 
and goodness of fit test^'ng it may be a 
waste to estimate the person parameters 
in each analysis. But it is of course 
not possible to obtain estimates of the 
standard errors of the estimated item 
-parameters if the person parameters are 
not computed. 



118 



PLOT/NOPL 



PLOT: For each itert a printerplot is 
made of the observed proportion of cor* 
rect answers against the proportion 
predicted for each score group (see 
chapter 3.1). Observe that these plots 
, produce a large amount of lines as out- 
put, 

NOPL: No plots are made. 



BINO/NOBI 



BINO: For each item and for each score 
group a binomial test is carried out to 
test the difference tretween observed and 
predicted frequencies of correct answer 
(see pages 46-^7). The power of these 
tests is lower than the "power" of the 
printerplbts but may at times be useful, 
Th^ also present the numerical informa- 
tion on which the printerplots are based. 
NOBI: No binomial tests are carried out. 



NOBS=N 



'N4l is the smallest size allowed for a 
score group if it is ,to be considered in 
the printerplots or in the binomial tests. 
The default is N=5* 



TEST=CHIS/LIKE/BOTH/ CHIS: The Martin-LOf chi-square goodness 



NONE 



of fit test is conputed (see chapter 3.2). 
LIKE: The Andersen conditional likelihood 
ratio test is computed (see chapter 3.2). 
BOTH: Both the overall numerical tests 
are compute*d. This keyword should be 
chosen only rarely, especially if k is 
large, for economical reasons. 
NONE: No overall test is computed. 



NiNDsN 



N is the minimum number of persons allowed 
within each range of scores when the 
Andersen test is computed. The default 
is NslOO, 



ERLC 



119 



A fairly typical example of the setup, including the JCL, re 
quired for executing the OS IRIS vej ^siop-^ PML on an IBM 
naehfne under OS is shown below: 



//UPEJEO JOB ... 
/»JOBPARM RTIMEs3,LlNESs6K 

// EXEC ... (referring to the library where PML is to be found) 

//DICTIN DD ... (description of the dictionary file) 

//DATAlN DD ... (jdescription of the data file) 

//FT12F001 DD UNIT=SySSQ,DISP=( ,PASS) ,SPACE=(TRK, ( 50 ,20 ) ) , 

// DCB=(RECFM=VBS,BLKSIZE=6006) (description of the 

scratch file used in the computation of the Martin- 

L»f test) 



//FTOlPOOl DD ♦ 

IN^tUDE V3=l» 
BOYS IN GRADE 6 
ALGOsSUM PLOT* 
V121-V110,Vltl5» 
/* 



(observe that the instream is defined 

as unit 1) 

(Filter card) 

(Title card) 

(Parameter card) 

(Variable list) 



How to use the non-OSIRIS version 



In OSIRIS the dictionary file is created with a special p: og- 
ram. Also in the non-OSIRIS version of PML a dictionary file 
is used; here, however, the dictionary is simply punched on 
cards (but of course the card images can be stored on a disc 
or a tape). The non-OSIRIS dictionary must be prepared in the 
following way: 



1st card 
, pos 1 



pos 1-6 



Logical record length (LRECL) for each record 
in the data file. (If the data are on cards 
LRECL is of course 80; if there are more items 
than can be contained on one card it is necea- 
sary first to create a file with a greater 
LRECL). 

The variable number of the first item descri- 
bed in the dictionary (need not be D- 



120 



pos 7-9 The variable number of the last item described 
in the dictionary* 

2nd and following cards: 

pos 1-3 Variable number 
pos i|-27 Variable name 

pos 28-30 Column location in the data file 

The variables must be continously numbered between the first 
and the last variable number, but there is no restriction as 
to wh^re in the record the different variables are located. 
It must be observed, however, that the information for each 
item must be punched in only one column (i.e. using II format), 
and that the responses of course must be coded 0 and 1. At 
most 200 items can be described in the dictionary. 

An example is given below: 



181 


28 - 


!13 






026 


VOK 


A 


1 


062 


029 


VOK 


A 


2 


063 


030 


VOK 


A 


3 


064 


031 


VOK 


A 


4 


065 


032 


VOK 


A 


b 


066 


033 


VOK 


A 


h 


067 


034 


VOK 


A 


7 


06B 


035 


VOK 


A 


a 


069 


036 


VOK 


A 


9 


070 


037 


VOK 


A 


10 


071 


038 


VOK 


A 


1 1 


072 


039 


VOK 


A 


12 


073 


040 


VOK 


A 


13 


074 


041 


VOK 


A 


14 


075 


042 


VOK 


A 


15 


. 076 


043 


VOK 


A 


16 


077 




This dictionary describes 16 items in a data> file with 
LRECL=l8l. The variable number for the first item has been 
taken to be 28; if, for example there is another subtest pro- 
ceeding this one which in a later step is to be analyzed to- 
gether with these items, the same variable numbers can be used* 



121 



In executing the non-OSIRIS version of PML there are 4 control 
-statenents (usually the same number of cards) which must be 
supplied: 

1. Title card (80 characters of information to label the output) 

2. Keyword parameter card (keywords are selected from those 
/described below) ^ 

3e Fixed foxnnat parameter card (is prepared according to the 

instructions given below) 
4. Variable list (see below) 

The keyword parameter card should contain a selection from the 
keywords described below: 

NODI: The dictionary is not printed. 
DICT: The*<lescriptions in the dictionary 
for the variables selected in the 
variable list arc printed. 

DIPF: The symmetric functions are computed 
with the Difference algorithm. 
SUMM: The symmetric functions are computWj 
with the Summation al^iorithm. 

This keyword is effecti\e only when SUMM 
is chosen. 

DOUB: Double precision arithmetic is used. 
SING: Single precision arithmetic is used 
for computing the symmetric functions. 

APPR: Start values for the iterations are 
computed according to an approximation 
(see page 3'^). 

UNIT: Unities are used as start values in 
solving the equations for the item 
parameters . 



NODI/DICT 



DIFF/SUMM 



DOUB/ SING 



APPR/UNIT 



EXTR /NOEX EXTR: The Aitken extrapolation (see page 

35) is used to speed up convergence of 
the iterations. 

NOEX: No extrapolation is used. 



PERS/NOPE 



PERS: The person parametexs are estimated, 
NOPE: The person parameters are not 
estimated. 



NODE/DESC 



NODE: A full analysis is performed.' 
DESC: Only descriptive in format icn is 
presented, without any estimation of item 
and, person parameters. 



PLOT/NOPL 



PLOT: Fpr each item ^ printerplot is made 

as a graphic test. 

NOPL: No printerplot is made'. 



BINO/NOBI 



BINO: For each item and |*or each score 
group a binomial test is carried out to 
test 'the difference between observed 
and predicted frequencies of correct 
answers. * 
NOBI: No binomial test is carried out. 



NONE /CHIS/LIKE/BOTH NONE: No overall numerical test of good- 
ness of fit is computed. 
CHIS: The Martin-LOf chi-square goodness 
of fit test is computed. 
LIKE: The Andersen conditional likelihood 
ratio test is computed. 

BOTH: Both the overall Numerical tests are 
computed-- 

The keywords selected to override .the defaults are written on 
the keyword parameter card, beginning in the first position. 
The keywords are specified in any order and are separated with 
comma or blank. The list of keywords must be ended with an 
asterisk. 

An example is given below: 
DICT SUMM PLOT LIKE* 

The fixed format parameter card is prepared in the following 
way : 



123 



P08 

- Maximum number of iterations in estimating the 
item parameters (MAX^). If left blank MAXI is 
assumed to be 250. The maximum number of ite- 
rations in estimating the person parameters is 
. taken to be ^ times MAXI. 

5t8 The accuracy required in the estimation of the 

item and person parameters in terms of nuinber* 
of decimal places (ERRO). If left blank ERRO 
is assumed to be 3* 

9-12 The minimum number of persons allowed within 

each range of scores when the Andersen test is 
computed (NIND). If left blank NIND is assumed 
to be 100. ' 

13-16 The smallest size allowed for a score group if 

it is to be considered in the printerplots or 
the binomial tests (NOBS). If left blank NOBS 
is assumed to be 5. 

17-20 The variable number, according to the dictionary 

of the ^item chosen for unity normation(NORM) . 
If left blank the item of medium diffidulty is 
used for unity normation. 

Even if there is nothing punched on the fixed format parameter 
card it must be physically in place, after the keyword para- 
meter card. An example is given below: 
100 ^ 150 10 

The variable list must contain a list of the variable numbers . 
for those items to be included in the analysis. Each variable 
number must be specified with three digits (e.g. 006) and the 
numbers should be separated with comma or hyphen, where the 
hyphen indicates that a range of items are selected. The 
variable list must be started in position 1 and as many cards 
as are necessary may be used. Each card must be filled, how- 
ever, and the comma is the only sign which is allowed in co- 
lumn 80, if continuation to a new card is to be made. The 
variable list must be ended with an asterisk. An example of 
a variable list could be: 

124 



005,008-bl7,03'«* -.^ 

In Executing the non-OSIRIS version of the control cards 
are read from unit 1, the dictionary from unit 13^and the 
data from unit l^l. A fairly typical example' of the setup^^in- • 
eluding the JCL, for executing this version of PML on an IBfT^-^^ 
machine is shown below: 

- //UPEJEG job' 

// EXEC ... (referring to the library where PML is to be found) 

//PT12F001 DD UNIT=SYSSQ,DISP=( ,PASS),SPACE=(TRK,(50,20)), 

// DCBs(RECFM=VBS,BLKSIZEs6000) (description of the scratch 



\ file used in the computa- 

I tion of the Martin-L5f test) 

//FTllPOOl DD ... (description of the data file) 

//PTOlPOOl DD » . 
GRADE 6 (Title card) 

SUMM PLOT» (Keyword parameter card 

150 (Fixed format parameter card) 

121-110,115» (Variable list) 

//PT13F001 DD » . ^ 

256 78192 (The dictionary) 



O78GRAMMARTEST 1, ITEM 1 112 



192GRAMMARTEST 1, ITEM II5 226 
/ 



7.^ The most important subroutines 



ERIC 



READ reads the data, forms the ((n^^)) matrix and compu- 

tes the proportions of corrects answers, the point- 
biserial correlations (with the item included in 
the test) and the KR2Q. 

PAREST administeres the iterative solution of the equations 
for the item parameters. 

QAMMA is used to compute the symmetric functions with the 
Difference algorithm. This routine has been written 
by Fischer (197*«). 

125 



aA>IMA2 - supervises the computation of the symmetric functions 
with the Summation algorithm and calls repearedly 
the 

0AM routine,which is a slightly changed version of a 

routine presented by Fischer (197^)f or the 
GAME routine, which is a single precision version of GAM. 

AITKEN computes the Aitken extrapolation, if requested. ^Tt 
is called by PAREST. 

PERS estimates the person parameters iteratively using 

the Newton-Raphson method. This subroutine has 
been taken from Fischer (197^) tut code for com- 
puting start values has been added. The present 
version also computes the standard errors of the 
person parameters and the routine calls 

ITINFO which computes the standard errors of the item 
parameters. 

ITTEST administers the analysis of the items and calls 

PLOTT which produces the printerplots and 

DPIBJN which computes the cumulative binomial distribution. 

The latter routine has been taken from Allerup and 

Sorber (1977). 

PMLCHI administers the computation of the Martin-LOf 

chi-square test but most of the computational work 
is, carried out in 

STORVA and in the two SSP routines 

DMFSD and 

DSINV which invert the variance- covariance matrices. 

EBACHI groups the score groups and computes the Andersen 
likelihood ratio test by calling PAREST as many 
times as groups found. 



7,5 Dimensioning of the program 

1 

The version which is delivered is dimensioned for '<j„ax"^^* 
Dummy dimensions are, however, used almost throughout so it 
is easy to dimension the program for both smaller and larger 

126 



problems. The following arrays must be changed in MAIN with 
K as the maximum number of items: 

INTEGER V(K),VMD1(K),VMD2(K),NIS(K,K),NR(K),A0I(K) 
INTE0ER»2 LIST(K),KDIPP(K) 
REAL*H WK2(2,K) ,W(K) 

REAL»8 EPS(K),EPSI(K),G(K),GI(K,K),WK3(3,K),THETA(K),SAVE(K), 
VARK0V(K»(K+l)/2) 

In GAM there are two arrays the dimensions of which must be 
changed : 

REAL»8 X(K),Y(K) 

and in GAME there are three: 

REAL*4 E(K),X(K),Y(K) 

Since the program tests that no attempts are' made to analyze 
greater sets of items than it is. dimensioned for an IP state- 
ment must be changed too. This test is made in MAIN immediate- 
ly after the variable list has been read. 

Furthermore, in any implementation of PML there is one more 
array the size of which must be considered. As was mentioned 
above on page 53 an array is used to store as many matrices 
of second derivatives of the symmetric functions as possible. 
' This array (STOR) should be dimensioned to be as large as the 
available core allows. It is also necessary that the size of 
STOR is represented as the integer constant in the statement 
immediately preceeding the call to PMLCHI in MAIN. 

7*6 A sample printout 

On the following pages a sample printout from a run with the 
non-OSIRIS version of PML is shown. 



127 



Pagt 1 



rOttO^lMG l»APAMET£MS 0VEPPl06b THE DEFAULTS; 



8 

0X2^020* 

mnntA or iTfMS ^ 

HAXIin^M WUMBCW OF ITERATIONS.. 2b0 
C»*lTErtION F0« COWVEPr^ENCE U.OOlO 

The SYMMETPKi FUr^CTlONS WILL HE COMWUT£0 «ITh The nirrpwpNCf meThOO. 
The AlTrfN EXTRAPOLATION fciRL HK USED TO SPEEO iJP CONv/EwGENCE . 

SCO**F 6W0UPS »iITH b OP FE-£« PERSOfiS APv ^qT CONSIDE^^FO IN ThE PLOTS 0« The HINOMUl TESTS, 



Page 2 



^ThF FOLLOWING VAPIARi-ES A^^E INCLUDED: 
NAME rLOC 



1! 

IS 

19 

^-0 



NOMflf^V btN 

NU'^'^kW Stw 



tb 



IK 1: 



I 

TfM i4 

TFM IS 
ITFM 

TFM 17 

TtM IH 

TFM ^0 



IS? 

lOH 
109 

i{? 

112 

Hi 
11* 



NKMtiF*^ OF CASES «tAO 

Nf»M^EH OF Cases •hh »i /fuy) sco^^f. 

N'JMftEK OF CASFS *ITh A FULL ^CO^rf 

NUMHE>^ yF CASES i^l**iklKjhtQ FQ^ A»^ALVSIS. 



SJ 



128 



Page 3 













POINT RISEMUL 


12 




SEWIFS 


ITtM 


l^ 


0.S78 


0.b30 


13 






ITt^ 


IS 




0.440 ' 


14 






ITtM 


14 




0.494 


IS 






ITEM 


iS 




0.4b7 


16 






ITEM 


10 




O.S(»3 


17 




5FWU5 


iTt-^ 


17 






IB 






I7f M 


Ih 




0.5S6 


19 






iTfM - 19 




U.SIB 


ao 


NUMRf.M 




ITtM 


ao 


b.sol 


O.SIO 



NORMAlIW ON VA«UMLt 16 



ThF PELIABILITY (^w-^0) is 0.6<> 



129 



THE new BY SCORC(>»OUP fREOU£NCV MATRIX OF CORRECT ANSWERS 





1 


if 


3 




b 


0 


7 


a 




12 




7 






3<» 


37 


63 


77 


271 


13 




16 




Jb 


b«? 


51 


t^ftb 


79 


336 


1* 




.8 






37 


3t» 


66 


75 


?80 


ib 




13 


30 




*6 


^4 


66 


BO 


31B 


16 


0 


^ 


11 




3b 


♦ 1 


SS 


80 


2b9 


17 










30 


3A 


5b 


71 




IH 


s 


? 


7 


] 19 


37 


4U 


61 


71 




19 






lb 


13 


at. 


31 


bi 


73 


£fi4 


^0 


s 


6 






^» 


3V 


5b 


74 


2Jb 




3h 






bf> 


6b 


60 


77 


as 





UOilKfcs -0. l6<*003iH9ir)*U4 • 



0 

^OMM£« OF- ITtRATlOMS FO>i CONVkK^ENCF : 



This is the maximum of the lo'garithm of the likelihood 
function. 



130 



le 

13 
14 

IS 
16 

fr 

19 



UNlTr NOMhaTIOH product NuWHATION PHOOUCT NOPMATION(LOG) 



STANDARD E«PO« CONrfOENCt INT6PVAH95 %| 



U3/0 
UUOOU 



1.0406b 
2.16410 
I.l4»%a4 
1.74S99 
U. 9179b 
0.7b<»9l 
0.7/0#9 
0,5/994 
0.71739 



-0.039A7 
-0.77200 
-0.13b27 
-0.5i57-i2 

o.oes6e 

0.?ftll6 
0.^607? 
U«b44(S3 
0.3321J 



o.ioass 

0.11618 
0.10919 
0. 11323 
t). 10794 
0.10/32 
0.10 736 
0.10719 
0.10123 



.a.2526tt 
•0.99971 
-0.349^H 
•0.77926 
-0.12b9S 
0^070<^i 
0.0SU29 
0.334 74 
0.12196 



0.17294 
-0.&4430 
0.07874 
-0.33538 
0.2971^ 
0.49I.S1 
0.47116 
0.7S492 
0.S4210 



Page 6 



ABILITy PARAMETEMS 



one 


PPOOUCT NOWMATlOfcl 


f'POOUCT NO^MATlON(LOO) 


STANOAttO tWPOP 


COMTIOEWCE INTERVAL < 9b %} 


1 


0.1169B 


-2.Ub75 


1.07098 


•4*24487 


-0.04663 


2 


0.27317 


-1.29/6«> 


0.81468 


•2.H9442 


0.29911 


3 


0.^8/h3 


-0.7177V 


0.72088 


•2.13U71 


0.69bl3 




0.79bS/ 


-0.22669 


0.68449 


-l.b7029 


un29o 


S 


1.^6b6J 


0.23b'D7 


0.68370 


-1.10448 


l.*i*7b62 


6 


2.0S96i 


0.r22b2 


0.71664 


-U.68603 


2.13106 


7 


3.66^ib 


1.29811 


0.81142 


-0.29226 


2.H86«»9 


6 




2.13903 


1.06722 


0.04728 


4.^:3077 



i Ni»M«f or |TF>'ATI»>WS f CONVt«r.rNCf nf ThF AHILITIES: 36. 
ON THE tOr» SCALfc iHfr MUN OF Tnf bAHPtf IS 0.3b I^ITH TmE VAOUNCE 
Th» INDM or bUfiJtCT St^'AWATlON lb U.o/ 



1.63 



131 




i 

r 



,0.34 0.3** U.iu 0,0b U.Ol 0,00 0.00 0.00 0.00 0.00 

O.IO 0.^7 O.Jl O.^X u.0»* O.U? 0.00 0.00 0.00 O.UO 

O.Or O.U 0.<^4 U.^i* O.^i 0.10 0.i)3 U.Ol U.OO O.UO 

o.ou ii.oi O.U 0.-?^ J.^/ o.?i o.n 0.0* o.ui o.oo 

U.Ou U.Ul U.U4 U.n 0.^7 0•^^ O.ll 0.U3 0.00 

n.OO O.UO O.Ol 0.03 O.lt' 0•^l 0.^9 0.<?4 O.il U.Ot' 

0.00 U.OO O.UO 0.00 0.6^ 0.0*i 0.?l 0.31 0.4:?7 O.iO 

U.OO O.UO U.UU U.OO U.UO 0.01 0.0t> 0.^0 O.JV 0.34 

'7 



Page 8 



Tmf VALUfS Of Tnt SYMMMFT*^IC FUNC^IO^<; 

O«0fR 

1 U.<*MJ6r'84^7 j<rS<»ifJ^L>»Ol 

2 0.4l<*^4<>J^474i?o63fe0*0<? 

3 U. lUlSj40«*^ib^4?5b<*i:)»0 } 
% , U.ib'»7l0l7/4t/l69/6O»03 

5 0. l'D3*-*3ittbttH0^'b3^*5f»»0J 

6 0.l00iftl74^lUSf M<:lf>»OJ 

? o.4U'>4u7^6it>i JO^/^•u^ 

Q 0. 10UUOOOOOOOOOOUiL)*Oi 



\ 



132 




5 

b 
7 



u 



37 



0*0 

o•^oo 

0»340 
0»6?S 
O.S?3 

^ ~ 



■ t:k 

0»b7| 
t 0*666 



TOO iOl< 



loo HIGH OHscMveo PROP 



The table ahows the reaulta fro« the binomial teata; 
There la of courae one table for each item. 



133 



ERIC 



r-AXiS : , ^g/jtjjjt 0 PMOPOJJT ION 



U.9 



0.^ 



O.f 



0,t 



r , 



U.J 



0.^ 



0.0 



0.0 
A-AA IS : 



0.1 



0.2 



0.6 



0.7 



0.6 



0.9 



for each iten, . pr.nterpiot^ the one atov. 1. prcu^d. The symbol 
X 15 used to mark along the y-axl. the observed proportion, and the 
symbol I used to irdi.te the predicted proportion. I.e. the r.s are 
Plac-ed or, the diagor,]. V'- -r, the V and the T. coincide an I- c-.-.d 



« 



Page 11 



THf SCOWC GOOUfS CONTMIRuri fO THt CM I -SQUARE SOm IS FOLLOWS! 
SCO«F ^G«OUP f(tUMHtN Of Otfbf WVaT IONS ConT^^IBUT |Oig 
I 

e 

3 



b 
6 

0 



3tf 


13.0SH 


3b 


6.1b| 


bJ 


I0.b73 


b6 


lb.7S0 


6b 


3.*»TH 




S.49J 


77 


b.rtlb 


Hb 


b.70^ 



THF «A;.TTN.LtlF CHI-bOUAPF GUODMeSb OF FIT TfST GIVE'S rHl.SUUA».F= 6t>.03^ WITH b6 JFG^F.^ FREEDOM. 

The fr'tUl'NOANCT l'^: U.0l9bJ*>H 



ThF virjjMjH L.Mhl- of OhSkWVAlIuhb ..HHP. tACH r,Po M uOk-EO ^hfN COMPUTING The LIkFlI^^ipOo PaTI'. rpST !<; 100 
T**F FoLLWl\ri M.(^uHNb »-AS ^^H- N USti) 
*^A%GE IwtJ'^bt*^ OF DriStWVAlONb 

^ ' I ^'^^ NUMMt'^ OF Utl^ATinNS F0»^ C^N Vf l.*SM<C> : f 

1 ^'"^ , ^uM^-^»^ ITtwATlONS VWA co^iv^ : m 

' * ' ^^^^ ^uM^^u iTt^ATiONb hh c«vwfw«,^fMC'' : lo 

T-^ LlK^^lipO!- ^ATiO MUW.b Of MT TfST CHI-Si'U^^= ^*».0l^ nMH UF^iwtt^ OF F-^fOOM. PrO.UH**lS. 



135 



ERIC 



7>7 The source code of the non-OSIRIS version of PML 



? PPPPPP VMM L fOUCATION, UNIVFWSITy OF 

g p V M L G0TFHOV'f>« prrf^He*' IV7/, 

C P M M t 

C P M H LLLLLL 

X TME*PPg*»PA!i*CoMPUT?s*a*^ L^^ilAM.UFcc^iir 

^ FIT TFSTS. 

C A COMPUTES ^HO^J'aHfjNn Ft.,). fUNlSH fN^TlTurr Fl>** FDUCATlONAt 

C FlSc^F^to^M: ^?^^0*HNFuiiV//n DI^ TsFOPlF PSV^OlOGlSCMt^* TfSTS, 

C *<ifn: v«'«la(' hans MiMcw 

C - - 

C st ilr I S f T H V' ^ f uc K hOL^^S UN I V FN V T t F. T 



•i|UF<^OPAPHtn. iNsflTLiTFT PO»<SA«>-INGSHAl^'<AtlK OCH maTeHATIS*^ 

STaTISTIH* <-TucKhOL»^S UN I VFn^ T iF. T . 

^ THIS IS THk NON^OSlPTb VtH»SlON OF P^t, 



C THt CONTPUL CAHJb AOt PEAP FwQH UNIT I 
C 

C PPlNTFO OUTPUT ON UNIT e 

C OlCTIONAPr ON UNIT I J 

C OATA ON UMT J* 
C 

r iMF^rprro^Tsr/MMrNSIUN STft»rMh%l ^^oUlO I^FFIN*- AN Ai*MAV T^AT IS 

r SHOufn MSC ir k^^C^^^? .Ul &^ Th^ INtr-.-Pw constant tf TMf sTATFMfNT 
C IMSFhUltLY PPeceni^G IHF CALL TO r-^LC^I tN T^f MAIN P*^00*^ft»-« 
OI»*tNSl&f STuP«l«*3000) 

INTFGF^ » A^ei (4^0 > • IlOCU*^'1 • IVftl ( IS) • IST*^( U •VI^OJ • 
IWHOI (60) iv«( cM60) .N2S(6O.f>0) •NWi^O) . iOI (6«U .S^ 

iNTFGFi^o^i L 1^1(60) .M JFf (^0) 
^AL•* ».<*<^ ( ?**>0 ) f f>0) 
Ph Al •H e PS ( ^0 > I (60) .G ( 60) I (^0«6(»> .H I*t^' tteX 1( J«60) • 

1 iHh TA fsii) .t'^'-O*^ •*^Av^ (•iw) • rfA^^-^OV ) IniU) 
NKf yslb 
iHttlNTsu 
!N»- = l 

J wfAn Th( hItTtw- f.Kl) ? I Tl t -v.4Wi)^ 

c 

Pk A\U i ()U.> I L fi^^ L 

1000 FU'-^AT ) 

Wr^I IF 1001 > t t'^H 

loo I FOtt*A/iJ i %\t , / Ifi ,^^(itt,) 

C • ' 

CALL ^'>^ J f ( I VAL • • I CHiT , NKfc r , s^^ I J I 
L Isr«IC = ! V J i ) 
MAXl=IViL V* 
f »Vt*.)l-= ! , 0/ luva I VAL ( J> 
T "100= I VAL ) 

iMPf r = IvAL J"^* 
TFST; I vai t*^*! 
H!NsIVAI I 7) 
tj^l OTalVAL 
fSTAwT=!VMl (^) 
fl- J( = |VAl ( iO) 
MlNSTslVAL (U) 
NOt'Sr I VAt ( 
JAHIL = I VAL I i 3f * 
NUP^= I VAl I 1 

roF srNs I V Ai US) 

c 

C WFAO ThP vAkJ An I- LIST 

CALL f^i |sT U 1st • fvu. I'iUl •MVI 
KsNV 

« IF (NV.'«T,6U.'lw,NV,Lt To ) 

C PPINT i»UT Ihi- C<jN1«''^L I' P O^^^^aT Ig i 



i3e 



iffi^rjI/'O^^E »nKtN MTMPOLMJON WILL USFC. TQ SPFEO UP CUNVfPG 

^ 1. NOT ^o;jiiCr&Fg''?r?H?'fcfiV,: v^/^tj-^^Hr.r??^!^:^," 

' ^ PrAO ThF HATa . 

^ IF nDrscM,FO,0)STop 

'C COMPUTE STAkT-valUFS ACCOfvDlNO To THF VAlUF oF ISTAPT 
^CALL ^TApfu>S.K,NP,AONlSTAWT.rM0RM.Tl'=-T) 
C FS^MATr ThF IThM PARAMMh^^S 

J fSTlMATF THF AHIIITIFS ANO P^^IN^ OUT Jnt" Papa„mfRS 

^ fALl H»vS(t^S,ePSI,THMA.K,MAXP,tP«n^,N»v,WKiaiST,JAML.M 

C IF ASKkO FOP. It^Vf STTGATF fflCH I T*Em 

^ ui«Ul^^;I;^::i:Cys{af!:^^^^^^ ITr^ST<,HlN.TP, OT,SAVF.G.r,l,., 

C FF AbKEn FOP. COMPuTf ThF MAPT|',-L^f C^^I-SOUAPF TFST 

C SHOUm pr AS LAi-Gf Ab pr^blH fc, ^ ^'""^ ^TAT^^^Lfc^T AND if 

C IF AbKfO FO^. C -MPL.Tt THF lI'^FLIhOOO PATIO Tf ST 

ItKC^ rpc %''^-^f;*:;|Tf^;T.HJ,3) CALL Fm ach 1 ( Mfc thoP , I pp^ c . IF X • 
To 

STOP g/,;/*^^*^ '^'^ ITFMS SPeCIFlFn.) 



SuMsO.O 

NTNO=0 
NNOLLsO 
^•^ULL = 0 
DO t> jsi .K 
00 L = 1.K 



AOIJJI.O 
f 1 .J) xO.O 



137 



C PEACi THE OICTIUNAWV AND TMt CAStS 
C iNCPfcMt.NT The NIS-haTwM 

"00 20 Irl.K ^ 

3 GO TO 1 

irj JJ^,Frj,u)NNOLI.=NNOLl M 

GO To'! 
li CONTINUE 



inn? *^0^*'^»^>^»MN0LL tNFULL.NiNn 

00 30 



AOI (T ) =A0I n > ♦MS ( J. I ) 

C SFlECT ITFH f-O^- NO^^maTION 
IF .o)r,o To OS 

00 so i = u^ 
no 60 i = UKMi 

DO 60 j=l ,L 

' V ( I ) ( IJJ 
VMOi f IJ) =IbK^^ 

N.jWMr (K ♦ 1 ) 

r,<^ TO 

f^O t7 1 = 1 

ST /V g ] <^ 
6V CuNTlNoh 



OF 



pPOP = Ani ( I) V i .o/MMj, 
CA( I '.NAf't ( I •NA^'h ) 

loos FORMMCONOf-MATlON ON VAklA^Le-^Ib? ^^''''""^^ 
C PRINT TmF NIS-M«TRIX 
wPITF(6.100h) 

NPP|N;=,KMi-i,/,e.i 

MsIPPINT* IH 
lOlO FORMAT (• VA^^^NOM 



138 



^ UV OU Js 1 «K 



t.P|NT»,PR,.T:r . 

no sa 1 = 1. K 



100 CONTINUE 



^ SUHWOUTlMf bT^PT UPS. ^•Nk,AC)NNFjST£,Nc»wM, LIST) 

C COMPuTfS STAKT VALUfS FO^' TmF nePATIOr.S 
PEALjtt fPb(l) .PSUMfDFL 

1001 FOPMAT (/////I 

IF fNEjSTA.Fiv. 1 )G0 TO 1 
PSUM=U, 000*00 
OFl =0. 000*00 
no 10 Jrl.K 

PSUMrPSUM*AOMI)»l,OM.OO 
DO ?0 1=1. K 

DEJ^i/pS(NOPi!,)''^'^'^'^-"^^'^ 

no ^l i = nk * 
^l ^o^ff^or^^^^^^'^-L 

. Pf TuPii 
1 DO 30 1=1 .K 
JO FPS(n = l,uOf».OU 
Pt TUPN 



! ►''^ =</ . (jpti 

3 J r.^IFrNri*- *K JIFT { T ) 

. , IF(^}r.I'^.f U.0)Ol) to £U 

«*0 Pw = PP«»f I ( I ) 

O'j '-O I = 1 .K 

f PS ( T ) re I f I ) 

- n I • T ) =f .^s I ( n 

^^»« M ] ) P^ ( I ) 



SU 
C 



c 
c 

6iJ 



C COMPijTF TMt n.'y iMCrf) 
no 60 1= 1 



139 



COMOuTfcb Nfcw VALUES OF Thfc' HfcM PA^AMbTP^^S IN Tm^ !TF»^AT»Of 
I^TEGe^< NP(iJ tAOI < 1) 

,epsni)co,ouo 

DO ?0 Jxl.K 

2u FPSl { T ) =PPSI ( I ) ♦^'W u) *^I ( I • J) / iS, J) 

10 e^'st ( I )=Aoi ( n/tPS! » I) 

O^FPSl (NO«M) 
00 30 I=X»K 
KOIFF(I)sO 

IF (OA»iS(FPbI < l)-fPS(I) ) •bT.t'^*^OW)KDIFF ( I) rl 
3U FyS(I)=EPbI (I) 
WJ- TORN 



SUMWOOTiNt (»AMMA a PSfOffrl ,N) 

? COMPUTES THt S^mmKToIC FUNCTIONS *<ITm A wECOwSIVf FoJ/mOLA ThaT USfS 
C SURT»^ACT IONS. Inf ALf-O^^iTHM IS SFNSITIVE TO f^OUNIi-OFF ePPO^S mneN THf 
C NiiMHEP OF ITKMS |S I.APoK, 

C Th^ pouting lb TakFN FH^O^^ FISCmE^' il^l^)^ 

Pf- AL*H r, ( 1) (N.N) 

C GIdtJ) 1 = 1 ti J = fN...»N-i GPUNOFUNKT ION J-IfP OPONONG OHNE I 

NOPMs (^4♦l ) /i! 
NuPMl sNOkM* I 
r. ( 1 ) =0. 

IK' c^on i = i«N 
( 1 . 1) = I . 

^O0 G ( 1 ) =0 ( 1 » ♦F^S ( I ) 

no ?3o j=^.^uPMi 

^lO GK I ♦ J) =(W J-i ( T f J-1 ) •tPS ( I ) 

fW J)rO. 
IK' ??0 I = l.r>t 

J) =G U> ♦GI ( I • J) *FhS ( I ) 

?an j>=G(j)/Df loat (J) 

Tf'*^T =(,(N'OPMl ) 
G I N ) = I , 

no ?su l^i ' 

G| ( I *NJ = l\ 
00 J-i«N 
IF ( I .F fJ. J) 0<J 
GI <i,N)=^I (i.N)*>FPS(J) 
^4 0 CONTINUF ' 
3b0 G (N'> =fWN) *FPS ( I ) 

0'> ^70 J=l .Jl 
G (N-J) =n. 
n() 1 = 1 

G (N-J) =(MN-J) ♦GI (I •N-J* 1 r 
r. (N-J) =C (N-J) /Cf LOAT (J) 
PC ^^70 1=1. K 
?70 G I ( I .^-J) i (S (M- J> - SI ( i .N-J* I I ) /Ft^S ( I ) 
T|:'^T = U-7f-ST/r,(N0PMl) 

IF rh^^^s M ) - i .u-** ) Ji u. Ji o.-eMO 
<;top 9^o 



C Thp ►^CUTT^I- s<'PtwvI*^.FS ThF C'imPUTaT ion i)P T^f symmMhTC FUfjCTlON*; 
r ANO iHf IP (rwlvAlIvPS >*ITh Tnf >:>iiMMa|inN MhTr,Or. ^Ni r i )NS 

»f AL'»« f- PS ( I ) ( i ) .Hi (K ) •sTiWt' 
K M 1 = K - 1 

00 10 1=1. K 
STOwj =F ( I ) 
F PS ( I ) rO . UGU 

1 n^Pf V. '^.\M r ALl ' .iM j J. , (,) 
IM IPPK .Mi. 1) C Al L t'/i^'M^ ^S.K 

0 n 1 . I ) = 1 . Oirj 

O'l r^U J=l.KMl 
?U GI ( I 1 ) =G( J) 
Fi^Sl I ) =STOP^ 

CALL r-AM (F Pb.K .6) 
»^ TORN 




I'JO 



c 

C TMf POUTiNt CO^^\JM> ThF -SYmmFT^^IC MJN(.Tln,jS h]Jh A->FCU^SIvh FOkmJLA 

C THAT ONLY L'SfS ^UL T I»-L I C ft T I 0(\» AM> ApOITION OF ^^.'»^ITIV^ Nu^^H^Pb, 

C nOUBLF PPtCl*:(u\ } Th»..F T I C IS li'^e^, 

C The HOUTlf)F IS TAHfN FWOM f ISCmP « (N7^), 

c 

QML^b f f-^ ( i) f F I U • A (60) • Y (bU) 
no 10 Isi^K 

' y ( I ) =E ( 1 ) 
DO 30 Isf^K 
X<I)sO,UiHJ 

V u) < I) ♦E n ) 
no i'o jsffi 

<?U Y ( J) =x < J) •« ( j-n *»M ! » 

DO 30 J=UJ 
30 )'(.i)rY(J) 

DU **0 J=1*K 

*o nj)sx(j) 

«tTURN 



C This is ft SINGLE PWFCI^IO.j VKRSION OF Th^ T.am ^y^uTINK 

REAL*»« f PS( I I ^Gi ] ) 
PEALO<» F (*>0) (60) (f>U) 
•DO 10 1 = 1 *K 
F(I)=SNOL«EPS(n) 

^ ( 1 )=F ( n 

DO 30^|s£'.K 

^^s^ 00 dO J=?*I 

20 Y ( JI 5X ( J) ( J-1 ) ( I ) 

DO 30 j=Nl 
JO X(J)=Y(J) 

O^J J=i»K 

^0 . r, r j> =OHi M A ( J) ) 

Pi TUP\ 
ENC 



SUM^^OijT JM«: AI TKe N( P<^,LPSI ,G«^,I .<^N^,AOi f^^POP^^'OlFFtLOOP. 

TMt ^OuTUjt CC) ■it-'UTt- S ThF aIT^^\ t « ^Pi at I nr T» <;Pf^o UP COf^VEPGENCt 



Of ftL*M A ( 3,K > ,e ►^S ( 1 ) ,f PSI ( 1 ) U.) aw (•< )',H,HO 

iNTh r>r w ^^^y { i ) , fl(>I t 1 ) 

iKTh r.f P«»(> »< ,,IFF ( I ) * 
I^ ( I TFP,L T ) (Ml T(j 

LO^PsLO'^P* 1 

r tL L TmmPov ( b PS PS 1 •kO I FF •K • A»n iP, \f)^M) 

Mi> 10 I = ifr 

f^r.ir=Nrw^ ^ki. if f ( i ) / , 

1 (j A ( I oOt^ • J 1 =f- •-•b f I) . * 

IF (riHTf ,t V) ;t^^y^, ^ 

If ('.OOP, I T , j> Pf TUP^ 
n ) ^o 1 = 1 .K 

If (KOlf »^ ( 1 1 'j.u) r,o U) eo ^ * . 
Hf)=A { 1 • I ) -A { n 

H = f APS(hrj/iA(l«I)*A( ^^p-t-^Atr^^I))) 
(M»nT , 1 , 0(1 1 ) H= 1 ,01)1 

f PS f I ) =A r J« ! ) ♦HO*H 
<'J CONTIhOF 

no 30 I = Nk 
iO ft<NI)=FPb(I) 

LO0P=l 

Pf TuPN 

? CaI L I^PPOV ( f. PS. f PS I ^r. .(i I U IFF • Al' I 'NK^f wwow, NOPM) 
Rf TUPN 
ENP 



SUMwOUT WS ( f .fw TmP I a . •'-♦ax .Ff- HLf w,NP , W** 1 • L I ^ T . J AH I L • <i ) 

C ^^TjPAThb Thi ►^Su'j f- A^- AmM [Tf^hTTvM^ t'S^'J^ ThC ^,F w T ' jN-P APhSON 
C ^^f lMO''* TMf ..:uT^^^ IS A'j IM^w.jVc') VfPSl(»\. TmI. rif f-^Es^^Ttf^. hy 
C F iSHf w ( 1q7h) , 
r 

wf-Ai «H t i jV,, < 1) ,Tk^f T/^^ i) ,F^HL'"^«SU«ni'«-*K K U*') \ ) * 
Ilr;f' • / I t/^r .L'*'. 
PmT K f P ( 1 I 
PmT^ r-F P*>^^ LI 1 ) 
K K - I 

If f 1 ) r. ) TO 1 

\f { ^^.A »F 1 ) .u Tn f I' 

C 

C OETFPMpiF fne r-rtr^f^r f,f vAwlAriurg of THf ITf*^ pA'-'AM^ wS 
Z1=E (1) 

no ^ i=/*H 

If ((- < I ) 1 

1 /UM!) 

IF ((- (n-/«')<».<»* J 

3 /r^sFlI) 

<» CONTINUE 1/1-1 

2^*0L0rw/^)-l)Loo(/l) 'X^X ' 



/ 

- I^PC22.LT.2.U(.0)viO TO h 

e ^VtS^'JlpSi^y^p^^*^^'^^ ^^^'^f*^ ASSUMPTION OF eOUALi:Y S^ACfn 

' 00 b I«i»KK 

o;n3DFxP(Z^«(^i-,c,f^,o, , 
b g<I|50^)MUoou-oE.P(^^l•7^))/(l.o„o.nFXH(-/^•,uoDO./l,,, - 

? CONPUTF STAKT VACUUS UMH ^ TMfc ASSUMPTION OF Egu^ I m' . A^AMt TEnS 
S CONTlNUf 

^ no 0 1 = 1. KK 

«^IF»0 

DO ••'O \ ♦KK 

bU=0, 

DO JO T«lfK 
^0 Su«bU^E(n/(K.F(I)»D(J)) 

*0 ThPTA(J)=OD/SU 
^ 70 IF(MAX♦rO♦l)^^ETU«^ 

c iS^ol^iiriSN.^'''''^''^^ ^^^.^^^ i^^" pa^amft^^s amo pwint out thf 

Call IT iriF0(*K3.rfNP»K»L 1ST) 

FFHLEI- =0,DO 

SU=0,DO 

00=0, DO 

NJNOsO 
OJ 110 IrUKK 
lNFsO,Ct'*00 

111 

Lor.=OL^o!u!i)r''^'^^^**"*^ ^^^'^^^^ 

^r> = n[)*^H ( I ) »L0^*«? 
NlMi^ = MMn»r,w ( I ) 

^i•=L0^ri•^6*INF 



112 i'^* 1 OOi^) I •[) ( n •LOGt INF • / 1 



loOJ J FO*;;'; A •Tj.UKUM^FK np iTf^AlJONS F^m r ON wr,t MCf OF Tnt AHILITIES:*. 

nri=l(>0» (Suo^r'/NINO^ > / fMNO-l ) 
F<- HLF» = Ffc hl^ ^/M IND 

Ft M» rpr U^^^-f f HI f ) /f,o 

iHrVftllANrnlF?,!??^' ^^^'-^ ''^-^'^ ^^'^^^ IS«»F6,^». WITH T 

IF (K ,r,T, io) ^0 TO IPS 

NOl L = 0 

W^T TF (f,. H) 1 jj) r^oLL • ( I • I = 1 f K ) 

w 1 r F < 6 ♦ 1 u 1 i» I 

Qf^rl ,0L>0 

DO 130 J=Uk 

71 = 1 ,Ono*t (J)«fj( I ) 

130 COMThUF 

r-o 1^0 j= 1 

1*»U TnDAi )) =», (J) **i> n ) *• J/\)l) 
/Ul .Ot u/^D 

^ •'^ITMfe. KMS) I ,( ( I ) ( TMf-.T A ( J) 0 = I ) 

l^'O CoNTTNUt 

•'^^ I IF (6» 1 1)1 U) 

fJU ISO I = I .K * 

ISO h^in (^>t loi I) U(>{i) 

^.'U wMlTJ i !>') ? ) 

1U07 Ff.'JM/.T ( • I • ♦ ^,^x♦ • I Tf ^' P Ah A*^M h ►^S » / / ^ t ♦ » Hf^ j I ^ ' Jl >f^^^ ft T I ON J MiC T 

IWKATMN ^'mi PU( T M.H^'ATlON H">)M 

<?10 fc^vJTM^OUUrt) I I'> T ( IJ ♦ (wK 3 ( I ) , jr 1 ♦ J> 

lOlO ^-fOW-lAMM Tf-K vALOl-S ''f^ 1h{ SrMMM^T^v|c ^^U* f T T.)NS • / • OkOI-w« 

lull F ijUMAT ( • 0 • t n«r V. 1 ^ ) 

lul<? F.iPMAT ( M • • ISA, •Pm^hAj^ IL IT f OF ' )h J J 'j | f|<. fl ft^T/lfr hA* sfr.^* GiVMj 

1 THt •^t^^^f'N I Ah A^'t Tt ^ • //^UX t 'kAW SC.i»^t-«) ' 
1013 F(pPMAT ( • (M < I^A JlS**- I / ) / ^ ISl ) 
ICl* FOPmAT(« ^COt^ PA^Af-'TM'M 

lOlS FoUMAT MC • ♦ I .Uf 1 USf )X .^ilj- '^.^♦S ( / IM* ♦^Om I 

FNP' 



142 



C COHPUTFS THt STftNUAWO k^ROt^S OF THf HKH PAWAmmFPS AND f^wiivlTS OuT 
C TH€ INFORMATION ABOUT The ITEM PAKAKhTfi^S. 

RfAL^W ^P<^<Jl(3fK)♦PF^*5^!)♦INF♦;!i♦/^ 

JNTFGfP NP(i) 
MEr.FP*2 LlSTil) 
mUk~1 
WHlTF(6«l00n 

. no 10 i«i.K 

INFsO.OOtOO 
00 ^0 JslfKMl 

10 KPITt (h»luO/f) I 1ST ( I ) ♦ (KKi( J» I) f jsl • J> •INF»ZN?a 
RETURN 

1001 FO'^^'AT ( • I • • HUAf MTF^ ^^APA^^ETF^'^•//•iX • lUNp r NOPmaTJON M'^OOUCT NO 
IRMATIOM, MKMCUCt N0WMATI0N(L0G> STANOAWO EPWO^f CONFIOENCf InT 
2Ef'VAL(9b *)•/• VAM.NO*) 

1002 FOPMaT (•0».Ib»F»*,t>fFlJi,!>,Fiy.S»F^*.S,FlQ.b*Pl?,b«) 
ENH 



IIOUT?^^^'"^ lTTFSTnHlN.IPLOT,KPS,Gir,I,K,NM,MS.WKa,»,,LlM,NUHb, 

RFAi«*i Fpsn ) .CiU ) •r,i tK,K) 

IMTFGFP N^^n ) .NIS(K,K) ,NAMF (6) 
INTEGFP^^a LlSTd) 
* 0IMEN5T0N A(<»7)«wK?<K,c')»»a» 

23«MXxJ / PWO'-»PUivf t^.lON ••b*« ' ••0. oil. 0 0.0,1,0. 



M=a 
IN 
KM 

START THE UUCP OVf^< THE IThMS 



iNCsl 
^1=K-1 



00 10 1=1 •K 
Nh=0 

IF ( IPlN.t'J, nr,o TO if 
CALL ^NA^'t ( I 'NAMf ) 

wfy iT^ (^. luol > l 1ST 



.vw* , L 1ST ( I ) .NA^J- 

1001 ^>\»'yAPI/i^LE •,^A<»/»OSCOPF f,POUP FPEOUENCY 



2-V*LUF •/) 
C 

C LOOP 0V^ P SCOkF GPOUPS 

^ Do 20 j=l,K,-^l 

IF<NP{J)-N0HS)iNil,i2 
U NhsNR*I 

GO TO 20 
1? ^CQNTlbJUr 

*( J-N^) =EPS ( n*r,I ( I , J)/G<'J) 

Wk2 ( J-hHf 1 ) ( J-NH) 

WK2(J-Ne»c') =.NiIS ( J, T ) •! . U/NW ( J) 

IF ( IPIM.E'J, DGO TO dO 

M0HS=NIS ( J, I ) 

PP = i» ( J-h8) 



IF (WK2 U-NH« 1 )-WKr'( J-NHf 2) ) J , <• . 

MORSsNP ( J)-MubS 

PPsi.O-PP 



* P=nprw TN iyr..s ,NH I J) ,L.p) 

F^Sjj?;;;-??::;:^^^^^ 

IF (IPLOT.Nf .OIGU TO lU 

CALL '^NflMf ( I • NA^t 1 
wini S^^nMh, l0U3> I l<T ( 1 ) ,MAMt. 

lu ^{;;;(f*^^t;^*'<»o.,.VAPu«Lt No.,m,^.: .,ca.) 

RLTUPN 
END 



i43 



iNTgafp WAPUhlt »H0M fUMNlATlvf. PuoHAHlLIlr TO fOM.urf 
N iNTEof.H NUHHtP OF TPt«LS In ThE DISTHFUtlON 
»» BtAt HKI-SAHILITV Of SUCCFt.S iN ONf T^Ul 

SlsO.O 
TsO.O 

IF C JQ.FU,U)cO TO 16 
I f 0 

no u isi-jo 

CONTINUF 

IF ( JO.ru.NJfiO To i?6 

c 

s^=s 

CONTpNlue 

T=T*n 

CONTlNUt- 



iH^ ThOfM c 'L IST •< lOP •^^wAOt^M •N 1^0% w iKF ) 

^ THE wnuTKF /ir)M I n I V T ^ PS T mf (OmPOTATIQm OF THf ChT-SOUA»E ^iQmOnFSS 

C OF FIT TTST SU^'^KSTFP \ i MftWTIN-LOF (lv73)/ 

C 

IMe f^f L 1ST { i » 

MTe r,^ MS ( ►< ) • f'lJ < 1 ) 

Km I tH, T»-h T f ^ Al F .F- L I^^ 
OfAL«*<» M {W (fjTf^Af) • 1) tW ( 1) 
If M^=0 

KVl=K-l 

Cm1=0.UOO 
•*kT TF I 007 ) 

lui»7 rot^'-'A T I • 1 Tnt "^C^^k l^*VOU^JS CONTwihUTi- Tn Tnt CHl-SOUAWf SUM AS FOLL 
lows: •//• SCnkF '^KtiUP NuMMtM Of o^s^- >^vaT in js contmimutionm 

c 

C ^)♦'TH'MI^;F HOW MANY TiMfS n IS Nh C*"^ SAt^r TO FIll ThF maT^IX wnpRt 
C THt S^ COfjn Df»^WATIVf<^ oF Thf ^t»^*MiTuir FHhrTION*; iw^' ST(»*^^'). 
r 

111 NLOl>»^ = '^^l/MH*An 

IF (Ml 0<i»'.e O.U ) G^i 10 Ul 
NhAO = NTP Aff 

c 

r CALL TnF POUliNf w^-fke Tne COM^^uTflCT lON^ Ah?K CAWwjFn ODT.* 
f)0 1 40 LO(>^^= I tNLr.OP 

CflLl ST(>^^va(^•ps,^ ,r,i.K«t<Kl«vFK»'^<^« VAPKf>v«MTMAD»r*^I •NPAO^LOOPt 
Iks ♦'sTOH • I (►HMtNw, ] s, iHt^f (, , * , U h?/^ I 
li- ( IFW^.rt- .0) iu>^'. 

c , 

C IF TnCkt A^^t A^T SFd^^F -^^v^)uPS LfFT* TAK^ C t^^t OF T-"iFM, 
C 

Ul IF ( I OHf, -KM) 30* 1 JO 

l^? LOO^^^•<M1/NT^AO• 1 
N^AOcwi - i()Kf ^ 

CAI I ST.JHVA (»• ►'St (jf F.I •K f KK I f VFK f V A WK () V .NT W 40 • CH I •NPAD^LOOPf 



144 




CAiL MOCOf nCHNMOF.ALF.lfiP) 

10U3 FOHMAT(////« THf MA*iTlN*L^F ChI-SOUAPF (,0(>l)^f^SS 0|. FIT ThST oive^ 
lCHt-50UARe«« tf^U. J» • WllH^tltot* OfG^<Efc"S OF F^^FF'^O^. He • ,F ? ,S. • , • ) 

C COMPUTE THfc WF0UN04NCY 
C 

Hs-CHl/(?,0»FLl^f ) 
mHITF (b*100H) H 
iOOH FORMAT ('OThE KEDUNPaNCY l'3:«tF20,7) 
Rf TURN 
END 



SURROUTJNE UbINV(A»NtfPS«IEW) 
8 This SSP-POUTlNfc inverts a SYMMtTRIC PO'alTlVt OFFlNlTP maTMIX 

c 

REAL*e A( U tOlNtWOPK 
CALL OMFSD(AvN»FPS»lFP) 
IF(lER)g»l»l 
1 IPlVsN»(N*l)/2 
iNDoIPIV 
DO 6 IsUn 
01N=1 .DU/AdPlV) 
A(IPIV)cDlN 

KtND»l-l 
LANFsN-KFND 
IF (KFN0)5fSt2 

^ j=iND 

00 K = 1»KEM) 

WORkxO.DO 

HlNsMlN-1 

LH0H=1PIV 

CvFRsJ 

00 3 L=LANF»M1N 
LWF>^=LV|-k*i 
L^OHsLhO>V*L 
J WOffHsWOWK^Aa Ve^v) OA (LHO^) 

A( J) =-NORK«DIN 

b IPl VslPlV-MlN 

6 lNO=lNO-i 
00 6 1=1. N 
iPlVrlPlV^I 
J=1P1V 
OO^JtJ K = 1.N 
WOPKrO.DO 

LHOHsJ f 

00 7 L*K»N 

LVF>^=LHOP»K-r 

>»0PK=wOf<K ♦A (L HOW) • A (L VEW) 

7 LHOPsLHOM^L 

8 JsJ*K 

9 RtTURN^ 
ENO 



SuWkOuTlNE UMFSDtAtN.FPS.lE^) 



ERLC 



THIS SbP-wOUTiNt- Farlo»v«; 
IT IS CALLK' OST'V. 



A <.1V>-N b^MMt-TwiC POSniVF ni'FlNlT^' MA)wlX, 



10 
11 



P|-AL<»e A ( I) •''•PI V.IjSU« 

IF<N-l) 

UP = 0 

K P I V = 0 

OO U K = NN 

KPIV=KPIV^K 

lrjn = KPi V 

TOl =AHS <FPS<'SNOL I A V ) ) ) 

00 U l=KtN 

ObUM=0,r:0 

IF (LFND)^.*»«<e 

00 3 L^NLEwO 

LANF rKPI v-L 

L JNO= INO-L 

O'^UMsOSUM^AlL ANF)*A(L INU) 
OSUMrA ( JND)-r,SUM 

F ( l-K) 10«b« I 0 

f- (bNGJ (!jSUM)-TOL)^.^«^ 

F(nSUM) 

F (lFR)b«8»^ 
1FR=K*1 

QHI VsOSOPT (LibUM) 
A(t«PlV)sUPlV 
OPlVsrl.CO/OPlV 
GO TO U 

A(lNO»»USUM<»pPlV 

iNOslNO* 1 

PtTURN 

1ER=-1 

RfTUWN 

END 



145 



KF Ai #H r>>Sn ) •Ad ) ♦(,i?< n ♦ vfcK ( n .VjMK^K) ,4101 .STO NCmI titTat 
lVAknov(p 

«tAL«'* ST()WrJTkAO«K*< n *•» ( 1 ) 
!NTtGEP fNIS(K*K) 

KMl=K-l 
KHr »K •>£ 

NLOOP«KKi/NT*VAD 



^ (LOOH.GT.UGO TO 11 

C COMPUTt ThF SFCONO OFhIVaTIVFS of ThF SymmFTwIC FliNCTIONS Th^.^uGm 
C kFPEATtO CULS TO THF (»Am (Qw oAME) wOuTINF 



C 



00 10 I=?»K 



. OIsE*'S( J > 
^T0J=EPS(J> 



FPsn)co.oo 

IF(lPRFC>17.17f 18 
17 CALL 6AM(f PS•^.G^) 

GO TO 19 
iR CALL GAME (EPSfKfGi?) 

19 l»(l)=0, 

00 ^0 L0H0=1.KM^ 

20 W(LO«D*cf)sG<f(LO^'n) 
FPS(n=STOI 

EP^i( J> sSTOj 

C 

C COPY TmF I^JFORMATIO^ FOP THF SCOPE GPOuPS HFiNG TPFiATFO IN ThIS lOO^ 
C I^TO THE STOP AWPay 

c 

IuPU=NTPAD» <L00P-1 ) 

00 30 L*^=1.NPAD 

lO^/OsIOWD^l 
30 STOP aP» JL ) =» ( KiPD) 
C 

C IF NfCeSSAwY fcsWITF ThF INFOPmaTION ON Tnh t)CPATCH F HE 

IF (NL6i>P.GT,U> teftlTEU?) (»<<KLO)»KLO=l»KMn 
10 CONTlNUfc 
Gl> TO U 

C 

C PFAO The SCPaTCh fiLe AND COPY ThK I NFo^jw at I ON «^0P ThF SC0«F r,POOPS 
C HFING TPfAUO IN This LOOP pjlo 1 HF STr4 APPAY 

5l RFWIND \^ 

00 no i=?.K 

L = I-1 

00 no J=1.L 
JL=JL*1 

PEAlMl?) (^L0) fKLOs 1 tKMl ) 

I0P[) = NTPAL)<» (LOOP- 1 ) 

0.^ 130 LPsnNPAO 

]0PO=T0Pl>* 1 
1 JO STOP (LP. JL) t »OPD> 
no CO'^'TINJF 
1? IOP(.=NTPAr«» (LOOP-1 ) 



LOOP OVFP THt SCOPE GROUPS HFiNG TPgAlfO IN THIS LOOP 



00 ^♦O LP= 1 .NPAO 

lop('=ioPn*i 

IF (NP( lOPO) ) 3V. JQ*<»1 
39 KS=KS-1 
GO TO 

r. CO^PiJTt ThF VAPIANLF- COVAPIancF MATPICES 

c 

^1 JL=0 
JK = 0 

00 t>0 J=l .K 

U (L.EO*J)GO TO Ul 

VAPkOV ( JL ) =Nk ( lOPO) •e PS( J) <»t^S (L) *STOP (L^J» ) / G ( Tnwp) 
GO TO SO 

1<»1 vapkov ( JL 1 =t*^S ( J) *r,i ( j» lOPO) •raP ( lowp) /(, ( T(^^/f^) 

C COMPUTfi THfe UlFFEwJ-NffS MhTWJ-FN ORSKkVH) aNO pwfOTCTf^O F^fuUfNCltS 

VFK (J) =NIS ( lOPP. J) -vAPnoV ( JL ) 
&0 CONTINUE 
C 

^ iNVtPT THF VAPlANCf- COVAPlANCt maThIC^S 

CALL OSlNV(VAP><OV.K,TOLnFP) 
If (U^.l 1 •0)(i() TO ^hO 
IF (IFP,f,T ,0) ,0 TO ^10 



146 



DO 60 |»1,K 



DC TirfLL. ^ 



.0.3 f s:t?ib^si?i?::iJr^ 



\ mi'. 

C \EtK FIPST A SUlT&bLt OMOUPI-mG QF ^TmE SCOPt ^WgUPS 

C G0\ OPWAPI>S FIPSI 
^ 00 10 1=1 

10 Ar«>(i)=o 

'^PT=0 
<? IOP^=IOPf/-l 

f TFST U THFPh A^t AMY SCOt^t r,wuUPS i- T 
^ IF(Inpp.(L0Pi>-n)3.JtlVV 

c APO A scop^ GPoup 

3 DC* <rO I = 1 tK 

Aor,{n^Af)(,(n*r.is(io^^n*n 
c * 

C TFST FOP ?|-wrj AM) fULL PPQPOHTIunS 

IF (40.,f I ) iL-,£',^ 

C TFST IF ThP f^uM^fH (.f SlHjfCTS IS SUFFICIFNT 

IF Ui^l-^lUbU 
C A OPlMiPlNc, Is ArCf-'^lFP 

Ni »P I =fjr^k 1 . 1 
^ KOOI 1NOPI) = IOJU» 

C AT THF PPOceSS GOlKb ODWjnAPC^S 

(jO I = 1 tK 

*U A(.G(I)=0 
NPTzO 

loa LOPo=LOPn-i 

103 >^'n(-O|i.-LO«0.10J.i0J.i.v 

bU AOf> ( n sAoG ( I ) ♦M^; (L I ) 

^ -^^!i*»^J=^WT*f4P(L0Pu) 
C TFST FOP Zi'^O AWO FULL 
DO 1 = 1,K 

^ ^ IF (Anr,( 1 ) ) hv* 111,', i?v 

1^9 IF < AOG( I )-riMT )hO. 10^. lu^ 
e>U CCMTlNjUf- 

IFfMPT-MlNSl) IOi^,lr^j,l^J 

KOOL (NOPI ) =L'lPf. I 

IM ( U)»vh* I )-l.OPj) 1 <»*<?0*» 

1 I^UlsSl^^ro^^^^^^^ ^^'^^^^^ ^COPf GPOOPS IS 

1V9 CONTlN'Jt 



147 




KOOL (NAhi ) ri own 



LIGHio 

I ^STHATF THF ,TF. P....Mt.S .,Th,. mCh OF THf .POUPS 
DO 300 UUN^.f^i 

no 310 jxi,K 

3J 0 J NMG( J) xO 

LpW = KO0t (Nr.KL-|*l) 

FLOT=FLCG/f LOT 
100? FOPMATcO^.M.' - V^-,.,>;.7'^^'< ^HSF^/alON'... 



c 



^>^;«MOull^^ m I st a 1st.. jn*^. lour . j) 



f fx: 
c 

ErJc 148 



^9 



100 f O^MAt (^UA4l•Tl •«fO(T 3«)X) «t 1 «^U( iXtAl) ) 

i^vTIf nOUTtlUl) CAPO 
DC ll»l«<»0 

TrndKKi.TecH on 

GOTO 10 
,b COhTTNUk 

LIST|J>«IX 

GOT(» i 
STOH M 



GOTO 
GOTO 



?0 
V9 



1 1 



Po<;, 



nli^/^'^icT '-''^!*^^">*MsTH/jiM.LrST0(3(j)W TSTf (IS) 



APPh « « 

^ . n^sc'/ 



Nl = NUMhtP uF FLfwfMS IN lISTa 

P AUL T-VALUf S 
IVAL( l)=l 
IVAL(?)=^bO 
I V AL (3) =3 
! VAL sO 
NAL «S) SO 
IVAL (6) sQ 
IVAL(7)sl 
IVAL («) =1 
WAL(9)sO 
IVAL rlO)sl 
jvALdDslDD 
iVAL r I?) x5 

IVAl (IS)si 

Au ( iNt^i 10 1 J L ISTC 
tw'^nn lOOl . 103) 
liO 11 IsltlS 

no* 10 j=i %M 

IF tLlSTC ( I ) ♦KO»L 1ST A (J) ) »t =j 
CO^jT I»^Uf 

IF <K ,ro,0) tjOTo 1 I 
^ = 1 ISTUCK) 
L=L ISTH (K) 

I i/AL <M) =L 

w*^ITf moot* 1';4») list a (K) 
WFAOflNP^lO?) LiSTr 

"MLlSTCd) .r;f .0) I V Al ( ) =L 1 S T C ( I) 
F { L TSTC ( .tjf « 0 ) ! ^/AL * ,M =L TC ( <^ » 
MLlSTC (?u^/f .U) IvAi Ml;=USTrO) 

r (I If Tf f4) .U) I i/AI ( Ir'l XL [' " " 
F (lISTC <S) .Nt.O) r VAL ( U) sL 1 



ERIC 



143 



: 1 1 ) ,Ne.(j) 



LfSTCi I) 
LiSKt>l 
LtSTC<3) 

L*STCISI 



rSlPMATcSofOUO^iTNb HA^AMtTFWS OV^MIOfS ThF AFf A' il 1 S : • // I 
FOPMAT f*U*«^X«*^AXI«*tl4) 

FOWFAT ! • • •P.OHS«» • 
FORMAT (• ••*A«»NChS« 

€N0 



SJ»*^<OUTi;w*^ '^HOK<l isnlC^MV.INPf laOT.L 1ST I 

C •♦^ITTFN JAN-«,l>NNAW TlNf.StLt^ UF.PAPfMCM FO^^ FOUCATlO'.At 

C RESeAPfMf urlVfP*-lTY OF GftTtwU*'^.. 

? r 

C PMO^> Dl^TIO»iA^<Y A^!0 r»ATA FiLfS. • ^ 

8 TK lUMb MUST Hi: m^CPlHFO IN A OKTIONAWY FIlF ON UNP U 

C DATA FIIF UNIT ^ ^ 

C 01CTI0^■APY l»f SC^'lHTIONS: 

C POS 1-3 VAPlij-LF NOMH^M 

r uo*^ •♦•?7 VA^'IA^-Ll" NAW£ 

C ^>0S ifrt-jo COLtl**N mCATlON \U OATAFIlE 

C EACH CASE M'JbT iNCLUnFf) IN ONt ht-COWO 

C FIWST MECO"l> IN fUniONA^Y FILE CONTAINS: 

f V^^StT f,9A*.Vt^ft Ub^AHLF NUMHfP IN OKTllNA^^V 

C UNI) frNlilSr, VA^^lAHLF NUMSf^ 

^ FCPMAT ( JIJ) 

C MAX .?00 VAtvItftUS CAN «f Of.M >JFD IN nifT10NA*^Y 

C T^^e VAtflAHlf^ ^'UST CONTINOU^LY NHM^f^ro TwFF.N IStAf-T 

C ANf) If Nil. 
C 
C 

C * 
C 

c 



c 



I^^TtC'f^ ^'tM^ t^fOO •bUNAMN *6» • V < n • X ( 1000) ,|)ATF IL 

INF !L=13 

DATflirU 

C Ai> [)RT I'JNA^JY F ILF. 



Pf a:)( INF IL. 1 tO> L^-F CL« ISf AWT.IFNO 

irilJ ( INF IL • 1" 1 it r*0= IS) N^VI n • (NAwe M • Jl . J=I ^TlOC ( I ) 



I = IM 

IF ( I ,r,T .^OOi STO»^ 1^ 

M>To lo 
1*- CUNTINUF 
r 

C CHf : »f vA^-U^Lt I l!^^ AtnJ PwInT 0ICTI'^N»^V 

^ IF <w tSitK.f .*.0) v»Pl If ( lOUl liO^^* 

nt' c-u i = l«Nv 

lf\\^MllSf^^^T.;'•H^.J.(.T.UN^» r.oTj 9u 

I i = L ist/hT. I 

O U = IS t AWT • I f 
IF ( I^.f w. J. V^'.L IS'MC.FO.O* I TF ( F UT. iO-^) NW ( 1 n • 

• r.At^ M 1 1^ I J> • I 3- i .^^ •Tloc M 1 ) 

^J C'N? INUt 
TL^^\ 

WW I Tc ( piul • 1 0* ) J 
ST^^- lb 

F\TwY C6bt <V.KL ) 

Kl =0 

C ►^fAl. ' NF CASF anO t'tTu'^N Ine WA«I/.HLFS If v. 

^1 aim;»atf iL . iOb.Ef^i)='<0) ;x ( I ) . 1 = 1 •tf^'^CL i 

[)'^» 1*1 •t4V 

J^.=L 1ST (n-IST/s^'T*l 
H>r:=TLOC ( Jb) 
t'S v< M=X(tOC) 
UFTUWN 

3V ><L = l 

C TMO OF FiLt 

FnTwY (jNAHF < 1 tNA^^N) 

? Ct T Thf VA*^I AF<LF NAmFS 



• 6 

bO NA«<N<J) =NAHF ( 1 I • J) 

PF TU*vN 
100 roPi^AT < 1 01 J» 
j (I) FORMAT 1 I ,u<SAw, f i) 

1 "VAP NAMM • -^/^ • • Tt l»r •/) *' 
J 0 ) FfiPMAT <• •.IUl;t.fAH./>>«IJ) 

1^4 F«'»MATl •OVA^'U.ht »^ :••?<». • NOr I N 0 I f I I ONA^ Y • | 

lus rnwM/ T (ifbo: 1 . 1 1 • "Sc I : .^sji n 

ENO 



150 



i 



id 

ERIC 



bUnWOOTlNP ►'LOT T ( x . y • J hC 1 1 A • A • I <)U U 

D!Mf NSION *(l).YlU,n,MisO)»T(M»lY(r') 
LOGICAL* 1 MTf< MOl .Si) •Mx(Sibn .ML ANMh) • Cn { 

1 (T ( n • Tf fK n ) ) ^ ' 

OaTa aL/« •/•T/» lA-* I • • |t)78« . '^O Ml/ 



c 

c 

^ IFlM.Nfc .2) GOTO 98 

C ctLANK Ol»T ImF HtOf 

c 

^0 .M* n ) =Hl ANK ( 1 ) 

C ANfj Y-VALUFS flL*»AtS >sO.U ANM <rUU 

C JUST ^ fur^CTIOfj. C'^^ Ht i^lctted 

C CONSTRUCT Th(- ff^AMPb 

00 n 1 = 1 • lui 

MTf^ ( I t I ) =Tf C (3) 

MTp ( 1 . ji =TrcK (S) -V 
, ^ IF M ij-n/s) «s.f u, (.j-i) ) MTwn • J) =T^CK (4)^ 



c 



no ^S 1 = 1 ,N / 
IF U ( I ) .L T.O^O. 
IA=100,U«A(I)U1.S 

IF ( Y ( I • IH) .i T >o,u,nH . V ( I , I 



I A = l 00,U«A < !){♦ 1 .S 



, - - , , , - ilM),r,T,i»0) GOTO 9^ 

I T ( I M) =1 U{).U*» I Y ( I , IM) / ?) ♦ I .b 
<?i C^f T 

IF i I Y ( 1 ) .FO. 1 Y (r'J) M)TO ^3 

I IV=IY(|M) 

ite htp ( Tx,ii>) = iFCKnM) 

IIY=!Y(1) 



W^Mf ( I( u f • 1 UL ) < A M ) . I = Jl t^O) 
0*^ iO J J= \ • 'vt 
J = S1 -,fj 
1 = J ♦ i 
* ./ = f . j/ 1 ou . 0 

!»• ( ( j/^} ..J) v-^l Tf ( I -)uT • 1 01 ) X J. (MT^^ M i J-^l) • ] -1 • 1 0 1 > 

C^'rinnf"'-^''-^^ ^-iTMIOuT.lO^) ( M r P ( r • 1) • U 1 1 1 i l! ' 

K ^ ( I - Ij T , J I 1 ) A J • ( M T ( I « 1) ♦ I r 1 , 1 II I ) 
w»-lTr(|iuT»iuS) (I),I=^U30) 

Ti 

jj'> ro»^.v/iT f • ) •// I r-AXiS:- ••10a*») 
j;i j^mp^aTi* '.f l<),l,lx,l'JlAl) 
I Or CM/. T < • • . 1 1 ^. 1 OUl I 

V'Z p ' j^vAM • • • 1 • » o,o» 7a, to. • • in • / X. 1 1 , o» ) 

® '^raN-.f 0 TO l.M vaL'J»"'5 i 

F'lO 



151 



REFERENCES 

Allerup, P., Mylov, P. & Spelling, S. (1977) Developmental 
curves through item analysis . Copenhagen, The Danish 
Institute for Educational Research, 1977.^1. 

Allerup, P. & Sorter, G. (1977) The Rasch model for questionna 
res. With a computer program (2nd ed). Copenhagen, The 
Danish Institute for Educational Research, 1977. ^. 

Andersen, E.B. ( 1972) The solution of a set of conditional es-- 
timation equations. Journal of Royal Statistical Society , 
11, ^12-5^1. 

Andersen, E.B. (1973a) Conditional inference and models for 
measuring . Copenhagen: Mentalhygienisk For lag. 

Andersen, E.B. (1973b) A goodness of fit test for the Rasch 
model. Psychometrika ^ 38, 123*1^0. 

Andrich, D. (1977) A general psychometric model for ordered 
categories scored with successive integers. Paper presen- 
ted at the Annual Meeting of the American Educational 
Research Association, New York, 1977. 

Andrich^ D. & Douglas, G.A. ( ^77) Reliability: Distinctions 
, between item consistency and subject separation with the 
simple logistic model. Paper presented at the Annual 
Meeting of the American Educational Association, New 
York, April 1977. 

Baker, F.B. (1977) Advances in item analysis. Review of Edu - 
cational Research , ^7, 151-178 

Bock, R.D. (1972) Estimating item parameters and latent abi- 
lity when responses are scored in two or more nominal 
categories. Psychometrika . 37 , 29-51. 

Bock, R.D., Lieberman, M. ,(1970) Fitting a r<rsponse model., for 
n dichotomously scored items. Psychometrika , 35f 179-'197. 

Brogden, H.E. (19^6) Variation in test validity with variatior 
in the distribution of item difficulties, number of Hpms". 
and degree of their intercorrelation. Psychometrika , 1 1 , 
197-21^1. 

Cronbach, L.J., Gleser, G.C., Nanda, H., & Rajaratnam, 

(1972) The dependability o|^ behavioral measurements : 

Theory of generalizability fo?Ngcores and profiles . New 

York : Wiley • \ 

I 



152 



Cronbach, L.J., * Warrington, W.G. ( 1952) Efficiency of 

multiple-choice tests as a function of spread of item 

difficulties. Psychometrika ^ 17, 129-1'*?. 
Dahlquist, G., & BjOrck, A. (197^) Numerical methods . Engle- 

wood Cliffs: Prentice-hall. 
Douglas, G.A. (1975) Test design strategies for the Rasch 

psychometric model. Unpublished doctoral dissertation. 

University of Chicago. 
Ferguson, G.A. (19^1) The factorial interpretation of test 

difficulty. Psychometrika , 6, 323-329. 
Fischer, G.H. (197^) Einfttrung in die Theorie psychologischer 

Tests . Grundlagen und Anwendungen . Bern : Huber. 
Fischer, G.H., & Allerup, P. (1968) Rechentechnische Fragen 

zu Raschs eindimensionalen Modell. In G.H. Fischer (Ed) 

Psychologische Testtheorie . Bern: Huber. 
Fisher, C.H., & Scheiblechner , H. (1970) Algorithmen und 

Programme fUr das probabilistische Testmodell von Rasch. 

Psychologische Beitrage , 12, 23-51. 
Gulliksen, H.O. (19^5) The relation of item difficulty and 

inter-item correlation to test variance and reliability. 

Psychometrika , 10, 79-91. 
Gulliksen, H.O. (1950) Theory of mental tests . New York: Wiley. 
Gustafsson, J.-E. (1976) Verbal and figural aptitudes in rela - 

tion to instructional methods. Stud ies in aptitude-treatment 

'I " 

interactions . GOteborg: Acta Universitat is Gothoburgensis . 

Hambleton, R.K. et al. (1977) Developments in latent trait 
theory: A review of models, technical issues, and appli- 
cations. Paper presented at a joint meeting of NOME and 
AERA in New York, April 1977. 

Kempf, W.F. (1976) Notwendige und hinrfeichende Bedingungen 
fUr ein allgemeines dynamisches Testmodell. In Kempf, 
Niehausen & Mach, (1976), pp 52-62. 

Kempf, W.F., & Niehausen, B. (1976) Algorithmen und Programme 
fUr ein logistiches Testmodell mit additiven Neben- 
bedingungen bezUglich der Itemparameter . In Kempf, 
Niehausen & Mach, (1976), pp 16-51. 

Kempf, W.F., Niehausen, B. , & Mach, G. (Eds.) (1976) Logistiche 
Testmodelle mit additiven Nebenbedingungen. Institut f(ir 
die PSdagogik der Natut^^issenschaften an der Christian- 
Albrechts-Univ%rsitat Kiel, No 22. 

153 ' 



Kifer, E. (1976) Estimating scores on a common metric using 

the Rasch model: An TEA application. Paper presented at 

the Annual Meeting of the American Educational Research 

Association, San Fransisco, 1976. 
Kifer, E.W., Mattsson, I., & Carlid, M. (1975) Item analysis 

using thf^Rasch model. An application on TEA data. Reports 

from the Institute for the Study of International Problems 

in Education, University of Stockholm, no 12. 
Kilbom, W., & Johansson, B. ( 1976 ) Elevernas rSknefSrdigheter 

(The pupils' arithmetic skills). Pedagogiska institutionen, 

Gttteborgs universitet, PUMP-proj ektet , 11. 
Lawley^ D.N. (19^3) On problems connected with item selection 

and test construction. Proceedings of the Royal Society 

of Edinburgh . 61-A, 273^287. 
Lazarsfeld, P.P. (1950) The logical and mathematical foundation 

of latent structure analysis. In S.A. Stouffer et al. 

Measurement and Prediction . Princeton: Princeton University 

Press. 

Lazarsfeld, P.P., & Henry, N.W. (1968) Latent structure analysi 
New York: Houghton Mifflin. 

Lewis, T.G., & Payne, W.H. (1973) Generalized feedback shift 
register ^pseudo random number algorithm. Journal of the 
Association for Computing Machinery , 20, ^56-^68. 

Loevinger, J. (195^) The attenuation paradox in test theory* 
Psychological Bulletin , 51, ^93"504. 

Lord, P.M. (1952) A theory of test scores. Psychometric Mono - 
graphs , No. 7 . 

Lord, P.M. (1953) The relation of test score to the trait 

underlying the test. Educational and Psychological 

Measurement , 13, 517-5^8. 
Lord, P.M., (1971) Robbins-Monro procedures for tailored testing 

Educational and Psychological Measurement , 31 , 3-31. 
Lord, P.M. (197^a) Estimation of latent ability and item 

parameters when there are omitted responses. Psychometrika , 

39, 2^7-26^. 

Lord, P.M. (197^b) Individualized testing and item characte- 
ristic curve theory. In D.H. Krantz, R.C. Atkinson, R.D. 
Luce, & P. Suppes (eds). Contemporary developments in 
mathematical psychology , Vol II . San Pransisco: Preeman, 



154 



Lord, P.M., &'Novick, M.R. (1968) Statistical theories of men - 

^ tal test scores . Reading: Addiscm-WeBley . 

Lumsderii J. (1961) The construction of unidimensiona^l tests. 
Psychological Bulletin , 122-131. 

Lumsden, J (1976) Test theory. In M.R. Rosenzweig & L.W. 
Porter (Eds) Anaual Review of Psychology , Volume 27. 
Palo «Alto: Aj;mual Reviews Inc. 

Lybeck, L. (197^) En mStteori f5r naturvetenskaplig undervis- 
ning (A theory of measurement for science instruction). 
Rapporter frin Pedagogiska inst it ut ionen, GSteborgs 
universitet, nr. 110. 

Martin-LOf, P. (1973) Statistiska modeller. Anteckningar frSn 
' seminarier ISsSret 1969-70 utarbetade av Rolf Sundberg. 
2:a uppl. (Statistical models. Notes from seminars 1969- 
70 by Rolf Sundberg^ 2nd ed.) Institutet f6r fOrsSkrings- - 
matematik och matematisk statistik vid Stockholms univer- 
sitet. 

Martin-LOf, P. (197^a) The notion of redundancy and its use 
as a quantitative measure of the discrepancy between a 
statistical hypothesis and a set of observational data. 
Scandinavian Journal of Statistic s, 1, 3-18. 

Martin-LOf, P. (197Ub) Exact tests, confidence, regions and 
estimates. Proceedings of conference on foundamental 
questions in statistical inference. Memoirs, no. 1 , 
Department of Theoretical Statistics, Institute of 
Mathematics, University of Aaarhus. 

McNemar^j^. (19^6) Opinion-Attitude methodology. Psychological 
Biynetin , 43, 289-37^. 

Mead/R. ( 1976a^J Assessing the fit of data to the Rasch model. 
Paper presented at the annual meeting of the American 
Educational Research Association, San Fransisco, 1976. 

Mead, R. (1976b) Assessment of fit of data to the Rasch model 
through analysis of residuals. Unpublished doctoral disser- 
tation. University of Chicago. 

Neyman, J., & Scott, E.L. (19^8) Consistent estimates based 
on partially consistent observations. Econ ometrika > 16 , 
1-5. 

Rasch, G. (i960) Probabilistic models for some intelligence 
and attainment tests . Copenhagen: The Danish Institute 
for Educational Research. 

er|c • -^^5 



Rasch, (1961) On general laws and the meaning of measure- 
ment in psychology, Berkeley symposium on mathematical 
statistics and probability. Berkeley: University of 
Califoi*nia Press. 

Rasch, G (I966) An item analysis which takes individual dif- 
ferences into account. British Journal of Mathematical 
and Statistical Psychology ^ !£, ^9-57. 

Rentz, R.R., & Bashaw, W.L. (1975) Equating reading tests 
with the Rasch model. Athens, Georgia: Educational 
Resources Laboratory. 

Spada, H. (1976) Modelle des Denkena und Lernens . Bern: Huber. 

Svensson, A. (196^) Sociala och regionala faktorers samband 
med Over- och underprestation i skolarbetet (The relation 
of social and regional factors to over- and under- 
achievement) Rapporter frSn Pedagogiska institutionen, 
GOteborgs universitet, nr 13. 

Svensson, A. (1971) Relative achievement. School performance 
in relation to intelligence, sex and home environment . 
Stockholm: Almqvist & Wiksell. 

Wernersson, I (1977) KSnsdif ferentiering i grundskolan . 

(Sex differentiation in school) GOteborg: Acta Universita- 
tits Gothoburgensis . 

Willmott, A.S., & Fowles, D.E. (197^) The objective inter - 
pretation of test performance. The Rasch model applied . 
NFER Publishing Company Ltd. 

Wright, B.D. (1977) Solving measurement problems with the 
Rasch model. Journal of Educational Measurement , 1^, 
No 2. 

Wright, B.D., & Douglas, G.A. (1975) Best test design and 
self-tailored testing. Research Memorandum No 19, 
Statistical Laboratory, Department of Education, 
University of Chicago. 

Wright, B.D., & Douglas, G.A. (1977) Conditional versus un- 
conditional procedures for sample-free item analysis. 
Ed ucational and Psychological Measurement > 37, ^7-60. ^ 

Wright, B.D., & Mead, R.J. (1977) BICAL: calibrating items 

and scales with the Rasch model. Research Memorandum No 2 3 , 
Statistical Laboratory, Department of Education, Univer- 
sity of Chicago. 

Wright, B.D., & Panchapakesan, N. (I969) A procedure for 

sample-free item analysis. Educational and Psychological 



Reports from the Institute of Education, GQteborg University, 



46. Gustafsson, Jan-Eric: Inconsistencies in aptitude-treat- 
ment interactions as a function ^f procedures in the 
studies and methods of analysis* Project MID 16, March 

X 9 7 6 » 

47. Gustafsson, Jan-Eric: Differential effects of imagery 
instructions on pupils with different abilities. Proiect 
MID 17, April 1976. 

48. Ekholm, Mats: Social development in school. Summary and 
excerpts.^ Project SOS 23, May 1976. 

49. Stangvik, Gunnar: Approaches to the analysis of learner- 
task interactions and some implications for the study 

of pedagogical processes. Project yp 7, January 1976. 

50. Andrae, Annika: Non-graded instruction in small rural 
lower secondary schools. A presentation of the PANG- 
project. Paper read at the Itg;rE5tSK0LA conference, July 
1976, Project PANG 20,^July 1976. 

51. Patriksson, Goran: Attitudes toward Olympic games of 
Swedish adolescents. Paper presented at the internationn] 
congress of physical activity sciences in Quebec Cit-y 
11-16 July 1976. September 1976. 

52. Gustafs^on, Jan-Eric: A note on the importance of 
studying class effects in aptitude-treatment interac- 
tions. Project MID 19, September 1976; 

53. Gustafsson^, Jan-Eric: Spatial ability and the suppres- 
sion, of visualization by reading. Project MID 20, 
September 1976. 

54. Dahlqren,Lars Owe & Marton, Ference: Investigation into 
the learning and reaching of basic concepts in econo- 
mics - a research project on higher education. Paner 
presented at the Congress of the European Association 
for Research and Development in Higher Education. 
August 30 - September 3, 1976, Laouvain la Neuve, 
Belgium, September ] 976 . 

55. Marton, Ference: Skill as an aspect of knowledge. Some 
implications from research on students conceptions of - 
central phenomena in their subjects. Paper presented at 
the Second International Conference on improving Univer- 
sity Tcrichinq, July 13-16, 1976. Heidelberg, F.R. 
Germany , September 1976. 

56. Andrab. Annika (Ed.): Non-qraded instruction. Research 
organization and' design. Administration and daily 
teaching experiences in small rural lower secondary 
schools. Experiences from the PANG-project . Paper r^cui jh 
the INTERSKOLA conference in Sveg, Julv 1976 . Projert- 
PANG 23, Aiiq\ir>t 1076. 

57. Sandgren, F3jorn & Asberq, Rodney: On cognition and socicM 
change. A report from a pilot study regarding the effect 
of schooling on cognitive growth and attitudes towc^rds 
social change in Pakistan, October 1976. 

157 



ERIC 



58. Sandgren, Bjdrn: Relation between cognition and social 
development. January 1977. 

59. Entwistle, Noel: Changing approaches to research into 
personality and learninq. February 1977. 

60. Harnqvist, Kjell: Primary Mental Abilities at Collective 
and Individual Level. October 1977. 

61. HSrnqvist, Kjell: Enduring Effects or Schooling - A 
Neglected Area in Educational Research, October 1977. 

62. Harnqvist, Kjell: A Note on the Correlations between 
Increments, Cumulated Attainment and a Predictor. 
October 1977. 

63. Gustafsson, Jan-Eric: The Rasch Model for Dichotomous 
Items: Theory, Applications and a Computer Program. 
December 1977 . 



158 



